Search code examples
javaregexsplitjava-stream

How to split a String into a Stream of Strings?


What is the best method of splitting a String into a Stream?

I saw these variations:

  1. Arrays.stream("b,l,a".split(","))
  2. Stream.of("b,l,a".split(","))
  3. Pattern.compile(",").splitAsStream("b,l,a")

My priorities are:

  • Robustness
  • Readability
  • Performance

A complete, compilable example:

import java.util.Arrays;
import java.util.regex.Pattern;
import java.util.stream.Stream;

public class HelloWorld {

    public static void main(String[] args) {
        stream1().forEach(System.out::println);
        stream2().forEach(System.out::println);
        stream3().forEach(System.out::println);
    }

    private static Stream<String> stream1() {
        return Arrays.stream("b,l,a".split(","));
    }

    private static Stream<String> stream2() {
        return Stream.of("b,l,a".split(","));
    }

    private static Stream<String> stream3() {
        return Pattern.compile(",").splitAsStream("b,l,a");
    }

}

Solution

  • Arrays.stream/String.split

    Since String.split returns an array String[], I always recommend Arrays.stream as the canonical idiom for streaming over an array.

    String input = "dog,cat,bird";
    Stream<String> stream = Arrays.stream(input.split( "," ));
    stream.forEach(System.out::println);
    

    Stream.of/String.split

    Stream.of is a varargs method which just happens to accept an array, due to the fact that varargs methods are implemented via arrays and there were compatibility concerns when varargs were introduced to Java and existing methods retrofitted to accept variable arguments.

    Stream<String> stream = Stream.of(input.split(","));     // works, but is non-idiomatic
    Stream<String> stream = Stream.of("dog", "cat", "bird"); // intended use case
    

    Pattern.splitAsStream

    Pattern.compile(",").splitAsStream(string) has the advantage of streaming directly rather than creating an intermediate array. So for a large number of sub-strings, this can have a performance benefit. On the other hand, if the delimiter is trivial, i.e. a single literal character, the String.split implementation will go through a fast path instead of using the regex engine. So in this case, the answer is not trivial.

    Stream<String> stream = Pattern.compile(",").splitAsStream(input);
    

    If the streaming happens inside another stream, e.g. .flatMap(Pattern.compile(pattern) ::splitAsStream) there is the advantage that the pattern has to be analyzed only once, rather than for every string of the outer stream.

    Stream<String> stream = Stream.of("a,b", "c,d,e", "f", "g,h,i,j")
        .flatMap(Pattern.compile(",")::splitAsStream);
    

    This is a property of method references of the form expression::name, which will evaluate the expression and capture the result when creating the instance of the functional interface, as explained in What is the equivalent lambda expression for System.out::println and java.lang.NullPointerException is thrown using a method-reference but not a lambda expression