Details
-
Task
-
Status: Closed
-
Major
-
Resolution: Done
-
None
-
None
-
None
Description
For scenarios like tsmm outer products t(X) %*% X, where X is a row vector, the post-processing in terms of (1) copying the upper to lower triangle, and (2) nnz recomputation significantly contribute to the execution time.
This task aims at a cache-conscious copy of upper to lower triangles similar to dense-dense transpose as well as fusing the nnz recompuation into either the computation or copy operation.