Unix-like data pipelines with Java 8 Streams and UStream

November 8th, 2015 by

We all love the simplicity when chaining commands and pipes in an Unix derivative environment.

UStream takes this approach and extends Java 8′s stream API to mimic some of the well known commands and apply them on streams.

In the following tutorial, I’d like to share some slim examples for using UStream.

Data pipelines with UStream

Data pipelines with UStream

 

Dependencies

We simply need to add one dependency to our pom.xml (using Maven): io.github.benas:ustream:0.2

<dependency>
	<groupId>io.github.benas</groupId>
	<artifactId>ustream</artifactId>
	<version>0.2</version>
</dependency>

Creating UStreams

A UStream is our entry point for further pipeline operations and we’re able to create instances of it using one of the following commands (linked to their documentation in the project’s wiki):

Now to some concrete examples for each one:

package com.hascode.tutorial;
 
import static io.github.benas.ustream.Predicates.contains;
import static io.github.benas.ustream.UStream.stdOut;
 
import java.util.stream.Stream;
 
import io.github.benas.ustream.UStream;
import io.github.benas.ustream.components.WordCount.Option;
 
public class CreatingStreams {
 
	public static void main(String[] args) throws Exception {
		System.out.println("1) Reading project's pom.xml using cat, reducing with head and tail");
		UStream.cat("/data/project/ustream-tutorial/pom.xml").head(6).tail(3).expand().to(stdOut());
 
		System.out.println("2) Concatenating two streams");
		UStream.concat(Stream.of("foo", "bar"), Stream.of("baz", "baez")).sort().nl().to(stdOut());
 
		System.out.println("3) Creating stream with current date");
		UStream.date().to(stdOut());
 
		System.out.println("4) Creating stream with echo");
		UStream.echo("uno\tdue\ttres").expand().to(stdOut());
 
		System.out.println("5) Creating stream from given stream");
		UStream.from(Stream.of("foo bar baz")).wc(Option.W).to(stdOut());
 
		System.out.println("6) Creating stream from directory contents using ls");
		UStream.ls().grep("xml").to(stdOut());
 
		System.out.println("6) Creating stream with absolute path of current directory");
		UStream.pwd().cut("/", 4).to(stdOut());
 
		System.out.println("7) Creating stream from existing stream with unixify");
		UStream.unixify(Stream.of("foo", "bar", "baz")).exclude(contains("ar")).to(stdOut());
	}
 
}

Running the code in our IDE of choice or using Maven in the command line should produce a similar result.

$ mvn exec:java -Dexec.mainClass=com.hascode.tutorial.CreatingStreams
[..]
1) Reading project's pom.xml using cat, reducing with head and tail
 <groupId>com.hascode.tutorial</groupId>
 <artifactId>ustream-tutorial</artifactId>
 <version>1.0.0</version>
2) Concatenating two streams
1 baez
2 bar
3 baz
4 foo
3) Creating stream with current date
Mon Nov 02 10:22:46 CET 2015
4) Creating stream with echo
uno due tres
5) Creating stream from given stream
3
6) Creating stream from directory contents using ls
pom.xml
6) Creating stream with absolute path of current directory
ustream-tutorial
7) Creating stream from existing stream with unixify
foo
baz

Additional Examples

We’re creating a new stream of strings, we’re creating an ustream of it, grep for all items containing the “a” character, we’re sorting the items, filtering duplicates, adding a line-number to each item and print the result to STDOUT.

package com.hascode.tutorial;
 
import static io.github.benas.ustream.UStream.stdOut;
 
import java.io.IOException;
import java.util.stream.Stream;
 
import io.github.benas.ustream.UStream;
 
public class Example1 {
 
	public static void main(String[] args) throws IOException {
		Stream<String> stream = Stream.of("foo", "bar", "bar", "baz");
		UStream.unixify(stream).grep("a").sort().uniq().nl().to(stdOut());
	}
 
}

This is the output from our example above:

$ mvn exec:java -Dexec.mainClass=com.hascode.tutorial.Example1
1 bar
2 baz

In the next example we’re using ls to display the content of our project tutorial, sorting the items by their natural order, converting to lowercase, trim white-space characters and finally print everything to STDOUT.

package com.hascode.tutorial;
 
import static io.github.benas.ustream.UStream.stdOut;
 
import io.github.benas.ustream.UStream;
 
public class Example2 {
 
	public static void main(String[] args) throws Exception {
		UStream.ls("/data/project/ustream-tutorial").sort().tail(4).lowercase().trim().to(stdOut());
	}
 
}

This is the output from our example above:

$ mvn exec:java -Dexec.mainClass=com.hascode.tutorial.Example2
[..]
readme.md
pom.xml
src
target

In our last example, we’re converting a string with Windows-like \r\n newlines to the *nix format, replace contents using a regular expression and tr() and print the result to STDOUT.

package com.hascode.tutorial;
 
import static io.github.benas.ustream.UStream.stdOut;
 
import io.github.benas.ustream.UStream;
 
public class Example3 {
 
	public static void main(String[] args) throws Exception {
		String input = "This is a tezt of\r\nsome text written using\r\nms-style line separators.";
		UStream.echo(input).dos2unix().tr("tezt", "test").to(stdOut());
	}
 
}

This is the output from our example above:

$ mvn exec:java -Dexec.mainClass=com.hascode.tutorial.Example3
[..]
This is a test of
some text written using
ms-style line separators.

Available Components

This is a list of available components linked to their documentation in the project’s wiki.

Tutorial Sources

Please feel free to download the tutorial sources from my Bitbucket repository, fork it there or clone it using Git:

git clone https://bitbucket.org/hascode/ustream-tutorial.git

Resources

Tags: , , , , , , ,

Search
Categories