Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
3.1.1-incubating
-
None
Description
Will end up creating a separate Spark Job for every task in the RDD. This will overwhelm the UI with un-important information and shouldn't be relevant to users attempting diagnostics. Since this RDD is relatively small we should be fine switching this line to a `.collect` call which will pull the entire RDD down to the driver in 1 Job.
So as long as the total size of this RDD is on the scale of megabytes we can make a readable user interface with
return IteratorUtils.map(memoryRDD.collect().iterator(), tuple -> new KeyValue<>(tuple._1(), tuple._2()));