Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Done
-
None
-
None
Description
Our primitive for writing binary block RDDs to HDFS (as used in guarded collect), first computes the number of non-zeros (nnz) and subsequently writes out the data. This leads to redundant RDD computation, which can be expensive for large DAGs of RDD operations. Explicitly computing the nnz is unnecessary as we could simply piggyback this computation onto the write via an accumulator as done in multiple other places in SystemML.