Uploaded image for project: 'SystemDS'
  1. SystemDS
  2. SYSTEMDS-2170

Remote parfor fails on reading ultra-sparse matrix with dims > 2G

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • SystemML 1.1
    • None
    • None

    Description

      The parfor optimizer has a rewrite to select remote spark execution type even if in the original program there are Spark operations if these fit into the memory budget of the executors. However, this rewrite does not check for valid integer dimensions and hence fails with

      Caused by: org.apache.sysml.runtime.DMLRuntimeException: Matrix dimensions too large for CP runtime: 3 x 5129281161
              at org.apache.sysml.runtime.io.MatrixReader.createOutputMatrixBlock(MatrixReader.java:80)
              at org.apache.sysml.runtime.io.ReaderBinaryBlockParallel.readMatrixFromHDFS(ReaderBinaryBlockParallel.java:59)
              at org.apache.sysml.runtime.util.DataConverter.readMatrixFromHDFS(DataConverter.java:207)
      

      Here is the related optimizer output

      ----------------------------
       EXPLAIN OPT TREE (type=ABSTRACT_PLAN, size=22)
      ----------------------------
      --PARFOR, exec=CP, k=16, dp=NONE, tp=FIXED, rm=LOCAL_AUTOMATIC
      ----GENERIC (lines 122-126), exec=CP, k=1
      ------lix, exec=CP, k=1
      ------b(-), exec=CP, k=1
      ------b(*), exec=CP, k=1
      ------r(t), exec=CP, k=16
      ------ba(+*), exec=CP, k=16
      ------rix, exec=CP, k=1
      ------r(rshape), exec=CP, k=16
      ------ba(+*), exec=CP, k=16
      ------r(rshape), exec=CP, k=16
      ------rix, exec=CP, k=1
      ------r(rshape), exec=SPARK, k=1
      ------rix, exec=SPARK, k=1
      ------b(/), exec=CP, k=1
      ------u(exp), exec=CP, k=16
      ------b(-), exec=CP, k=1
      ------rix, exec=CP, k=1
      ------ua(maxRC), exec=CP, k=16
      ------ua(+RC), exec=CP, k=16
      ------b(*), exec=CP, k=1
      ------ua(+RC), exec=CP, k=16
      ----------------------------
      
      18/03/06 23:17:33 DEBUG Optimizer: --- RULEBASED OPTIMIZER -------
      18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: Optimize w/ max_mem=24271MB/4638MB/4638MB, max_k=16/144/144).
      18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: Optimize w/ SparkClusterConfig:
      -- legacyVersion    = false (2.2.0)
      -- confOnly         = true
      -- numExecutors     = 6
      -- defaultPar       = 144
      -- memExecutor      = 69478645760
      -- memDataMinFrac   = 0.5
      -- memDataMaxFrac   = 0.6
      -- memBroadcastFrac = 0.21
      
      18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: estimated mem (serial exec) M=109MB
      18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set data partitioner' - result=NONE ()
      18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'remove unnecessary compare matrix' - result=false ()
      18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set result partitioning' - result=false
      18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: estimated new mem (serial exec) M=109MB
      18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: estimated new mem (serial exec, all CP) M=109MB
      18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: estimated new mem (cond partitioning) M=109MB
      18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set execution strategy' - result=REMOTE_SPARK (recompile=true)
      18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set operation exec type CP' - result=2
      18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'enable data colocation' - result=false
      18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set partition replication factor' - result=false
      18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set export replication factor' - result=true (3)
      18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set degree of parallelism' - result=(see EXPLAIN)
      18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set task partitioner' - result=STATIC
      18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set fused data partitioning and execution' - result=false
      18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set transpose sparse vector operations' - result=false
      18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set in-place result indexing' - result=true ([delta_b_softmax], M=160MB)
      18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'disable CP caching' - result=false (M=160MB)
      18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set result merge' - result=LOCAL_MEM
      18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set recompile memory budget' - result=24271MB
      18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'remove recursive parfor' - result=0/0
      18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'remove unnecessary parfor' - result=0
      18/03/06 23:17:33 DEBUG OptimizationWrapper: ParFOR Opt: Optimized plan (after optimization):
      
      ----------------------------
       EXPLAIN OPT TREE (type=ABSTRACT_PLAN, size=22)
      ----------------------------
      --PARFOR, exec=SPARK, k=3, dp=NONE, tp=STATIC, rm=LOCAL_MEM
      ----GENERIC (lines 122-126), exec=CP, k=1
      ------lix, exec=CP, k=1
      ------b(-), exec=CP, k=1
      ------b(*), exec=CP, k=1
      ------r(t), exec=CP, k=1
      ------ba(+*), exec=CP, k=1
      ------rix, exec=CP, k=1
      ------r(rshape), exec=CP, k=1
      ------ba(+*), exec=CP, k=1
      ------r(rshape), exec=CP, k=1
      ------rix, exec=CP, k=1
      ------r(rshape), exec=CP, k=1
      ------rix, exec=CP, k=1
      ------b(/), exec=CP, k=1
      ------u(exp), exec=CP, k=1
      ------b(-), exec=CP, k=1
      ------rix, exec=CP, k=1
      ------ua(maxRC), exec=CP, k=1
      ------ua(+RC), exec=CP, k=1
      ------b(*), exec=CP, k=1
      ------ua(+RC), exec=CP, k=1
      ----------------------------
      
      

      Attachments

        Activity

          People

            mboehm7 Matthias Boehm
            mboehm7 Matthias Boehm
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: