[SYSTEMDS-1009] Avoid spark context creation on parfor optimization - ASF JIRA

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: SystemML 0.11
Component/s: None
Labels:
None

Description

Currently, every parfor script triggers the lazy spark context creation, independent of its input data size and script in order to obtain memory budgets and parallelism. On small data the the spark context creation dominates end-to-end execution time. We should improve this to a configuration-only analysis, which would avoid the context creation.

For example, here are the XS and S performance results for univariate statistics:

UnivariateStatistics on mbperftest/bivar/A_10k/data: 14
UnivariateStatistics on mbperftest/bivar/A_10k/data: 14
UnivariateStatistics on mbperftest/bivar/A_10k/data: 17
UnivariateStatistics on mbperftest/bivar/A_10k/data: 16

UnivariateStatistics on mbperftest/bivar/A_100k/data: 14
UnivariateStatistics on mbperftest/bivar/A_100k/data: 15
UnivariateStatistics on mbperftest/bivar/A_100k/data: 14
UnivariateStatistics on mbperftest/bivar/A_100k/data: 17

Attachments

Issue Links

Is contained by

SYSTEMDS-1010 Perftest 0.11 release and related improvements

Resolved

Activity

Matthias Boehm added a comment - 04/Oct/16 21:20

As it turned out, the unnecessary spark context creation was due to (1) obtaining the spark version (introduced with our version-specific memory management), (2) unnecessary parfor eager caching/partitioning, and (3) unnecessary setup of fair scheduling for local parworkers. The fix essentially avoids these unnecessarily created spark context, which applies to all parfor scripts and scripts with unknowns (because these request the broadcast memory budget, which creates the spark cluster config and due to the version issue always the spark context).

After applying the fix, the end-to-end runtimes for univariate statistics are much better:

UnivariateStatistics on mbperftest/bivar/A_10k/data: 1
UnivariateStatistics on mbperftest/bivar/A_10k/data: 1
UnivariateStatistics on mbperftest/bivar/A_10k/data: 2
UnivariateStatistics on mbperftest/bivar/A_10k/data: 1

UnivariateStatistics on mbperftest/bivar/A_100k/data: 2
UnivariateStatistics on mbperftest/bivar/A_100k/data: 2
UnivariateStatistics on mbperftest/bivar/A_100k/data: 2
UnivariateStatistics on mbperftest/bivar/A_100k/data: 1

Matthias Boehm added a comment - 04/Oct/16 21:20 As it turned out, the unnecessary spark context creation was due to (1) obtaining the spark version (introduced with our version-specific memory management), (2) unnecessary parfor eager caching/partitioning, and (3) unnecessary setup of fair scheduling for local parworkers. The fix essentially avoids these unnecessarily created spark context, which applies to all parfor scripts and scripts with unknowns (because these request the broadcast memory budget, which creates the spark cluster config and due to the version issue always the spark context). After applying the fix, the end-to-end runtimes for univariate statistics are much better: UnivariateStatistics on mbperftest/bivar/A_10k/data: 1 UnivariateStatistics on mbperftest/bivar/A_10k/data: 1 UnivariateStatistics on mbperftest/bivar/A_10k/data: 2 UnivariateStatistics on mbperftest/bivar/A_10k/data: 1 UnivariateStatistics on mbperftest/bivar/A_100k/data: 2 UnivariateStatistics on mbperftest/bivar/A_100k/data: 2 UnivariateStatistics on mbperftest/bivar/A_100k/data: 2 UnivariateStatistics on mbperftest/bivar/A_100k/data: 1

People

Assignee:: Matthias Boehm

Reporter:: Matthias Boehm

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 04/Oct/16 04:01

Updated:: 05/Oct/16 02:00

Resolved:: 05/Oct/16 00:40

SystemDS