Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
SystemML 0.11, SystemML 0.12
-
None
-
spark 2.1.0
Description
when running runMultiLogReg.sh script, MultiLogReg.dml ends with OutOfMemory error for the case of 10M_1K sparse data and icpt = 1. Here is the end of the log file:
17/02/04 17:20:33 INFO api.DMLScript: SystemML Statistics: Total elapsed time: 697.694 sec. Total compilation time: 2.543 sec. Total execution time: 695.151 sec. Number of compiled Spark inst: 73. Number of executed Spark inst: 16. Cache hits (Mem, WB, FS, HDFS): 46/9/1/7. Cache writes (WB, FS, HDFS): 27/1/1. Cache times (ACQr/m, RLS, EXP): 281.541/0.003/131.589/48.737 sec. HOP DAGs recompiled (PRED, SB): 0/15. HOP DAGs recompile time: 0.067 sec. Spark ctx create time (lazy): 31.078 sec. Spark trans counts (par,bc,col):5/4/0. Spark trans times (par,bc,col): 46.748/0.392/0.000 secs. Total JIT compile time: 151.254 sec. Total JVM GC count: 144. Total JVM GC time: 220.671 sec. Heavy hitter instructions (name, time, count): -- 1) ba+* 144.194 sec 3 -- 2) rand 109.939 sec 9 -- 3) uark+ 105.011 sec 2 -- 4) r' 100.933 sec 3 -- 5) sp_/ 80.387 sec 1 -- 6) sp_mapmm 45.491 sec 2 -- 7) sp_tak+* 40.655 sec 1 -- 8) append 9.480 sec 1 -- 9) rangeReIndex 7.347 sec 2 -- 10) sp_- 6.392 sec 3 17/02/04 17:20:33 INFO api.DMLScript: END DML run 02/04/2017 17:20:33 Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:363) at org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:339) at org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlockUnsafe(MatrixBlock.java:408) at org.apache.sysml.runtime.io.MatrixReader.createOutputMatrixBlock(MatrixReader.java:107) at org.apache.sysml.runtime.io.ReaderBinaryBlockParallel.readMatrixFromHDFS(ReaderBinaryBlockParallel.java:59) at org.apache.sysml.runtime.util.DataConverter.readMatrixFromHDFS(DataConverter.java:203) at org.apache.sysml.runtime.util.DataConverter.readMatrixFromHDFS(DataConverter.java:168) at org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromHDFS(MatrixObject.java:425) at org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromHDFS(MatrixObject.java:60) at org.apache.sysml.runtime.controlprogram.caching.CacheableData.readBlobFromHDFS(CacheableData.java:920) at org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromRDD(MatrixObject.java:478) at org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromRDD(MatrixObject.java:60) at org.apache.sysml.runtime.controlprogram.caching.CacheableData.acquireRead(CacheableData.java:411) at org.apache.sysml.runtime.controlprogram.context.ExecutionContext.getMatrixInput(ExecutionContext.java:209) at org.apache.sysml.runtime.instructions.cp.AggregateBinaryCPInstruction.processInstruction(AggregateBinaryCPInstruction.java:74) at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:290) at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:221) at org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168) at org.apache.sysml.runtime.controlprogram.IfProgramBlock.execute(IfProgramBlock.java:139) at org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:165) at org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:123) at org.apache.sysml.api.DMLScript.execute(DMLScript.java:684) at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:360) at org.apache.sysml.api.DMLScript.main(DMLScript.java:221) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) 17/02/04 17:20:33 INFO util.ShutdownHookManager: Shutdown hook called 17/02/04 17:20:33 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-c8185e69-2eaa-4719-ab42-f6af0edcbbeb
Attachments
Attachments
Issue Links
- is related to
-
SYSTEMDS-1243 Perftest: OutOfMemoryError in stratstats.dml for 800MB case
- Resolved
1.
|
Fix transitive Spark execution type selection for ba+* | Closed | Matthias Boehm | |
2.
|
Keep track of parallelized RDDs and broadcasts | Closed | Matthias Boehm | |
3.
|
Handling RDD collects with unknown sparsity | Open | Unassigned |
ok this is interesting as the OOM does not come from the first read of our sparse input matrix but some intermediate. I don't have a cluster environment to reproduce this right now but will look into it in the next days.