Use streaming collection processing with a limiting step
Streaming data processing is when instead of building a collection after every step, the elements "flow through" the pipeline without storing the intermediary results. It seems like a good choice performance-wise.
For an Array, steps in a collection pipeline create a new Array after every step:
array
.filter(...) // creates a new Array
.map(...) // creates a new Array
.filter(...) // creates a new Array
.reduce(...)
Should streaming processing, like ImmutableJs's Seq used instead?
It depends.
When to use Seq
Use streaming processing if one of the steps down the pipeline limits the number of elements. For example, for an input collection with a lot of elements, and a limiting operation like slice
:
array
.filter(...) // creates a new Array
.map(...) // creates a new Array
.filter(...) // creates a new Array
.slice(...) // Drop most of the elements
.reduce(...)
In this case, streaming processing makes sense:
Seq(array)
.filter(...)
.map(...)
.filter(...)
.slice(...)
.reduce(...)
Arrays work by processing the steps one after the other. If the original Array has 1m elements and the subsequent operations also result in roughly the same amount, then the whole collection is processed every time. Until the slice
is encountered, which drops most of them.
In contrast, the Seq-based solution works backward. The slice
requests a limited number of elements, and only the minimal amount will be fetched from the underlying collection.
When not to use Seq
On the other hand, if the pipeline does not drop most of the elements at some point, then there is little difference in the runtime between the two solutions.
If there were to be no slice
in the examples above, the Seq would also process all the elements. In that case, streaming processing only complicates the code.