[SYSTEMDS-1837] Unary aggregate w/ corrections output to large physical blocks - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: SystemML 0.15
Component/s: None
Labels:
None

Description

Many unary aggregate operations store corrections in additional columns or rows. For example, rowSums(X) uses a two-column output to store sums and corrections. In CP, we drop these corrections immediately after the operations, while in MR and Spark these corrections are dropped after final aggregation. The issue is that the MatrixBlock::dropLastRowsOrColums does not actually drop the correction but simply shifts all values in the right starting positions. Hence, the physical output is actually larger than what the memory estimates represent. This leads to unnecessary large memory consumption during subsequent operations and in the buffer pool, which can lead to OOMs. This task aims to fix MatrixBlock::dropLastRowsOrColums.

In a subsequent task, we could also modify all unary aggregates to never allocate the multi-column/row output when executed in CP. However, this requires custom code paths for the different backends.

Attachments

Activity

People

Assignee:: Matthias Boehm

Reporter:: Matthias Boehm

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 11/Aug/17 19:25

Updated:: 09/Sep/17 03:47

Resolved:: 02/Sep/17 23:42