Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
While codegen worked extremely well for KMeans with 1 run, we currently see performance issues in a parfor setting with concurrent 10 runs, which all spawn distributed spark operations. In detail, this is due to particular plan choices that are affected by the reduced local memory budget per parfor worker. However, these issues can be overcome by avoiding unnecessary RDD joins in distributed codegen operations via better broadcast handling (currently the first input is always assumed to be an RDD).
Total elapsed time: 9305.981 sec. Total compilation time: 3.023 sec. Total execution time: 9302.958 sec. Number of compiled Spark inst: 21. Number of executed Spark inst: 193. Cache hits (Mem, WB, FS, HDFS): 1242/0/0/91. Cache writes (WB, FS, HDFS): 456/188/1. Cache times (ACQr/m, RLS, EXP): 10086.631/0.011/114.967/1.291 sec. HOP DAGs recompiled (PRED, SB): 0/108. HOP DAGs recompile time: 2.733 sec. Functions recompiled: 1. Functions recompile time: 0.043 sec. Codegen compile (DAG,CP,JC): 176/430/21. Codegen enum (ALLt/p,EVALt/p): 48076/47974/39249/38324. Codegen compile times (DAG,JC): 3.024/0.491 sec. Codegen enum plan cache hits: 0/0. Codegen op plan cache hits: 395/416. Spark ctx create time (lazy): 19.506 sec. Spark trans counts (par,bc,col):0/179/91. Spark trans times (par,bc,col): 0.000/1.954/10086.614 secs. ParFor loops optimized: 1. ParFor optimize time: 0.141 sec. ParFor initialize time: 0.022 sec. ParFor result merge time: 0.059 sec. ParFor total update in-place: 0/40/50 Total JIT compile time: 98.963 sec. Total JVM GC count: 374. Total JVM GC time: 72.456 sec. Heavy hitter instructions: # Instruction Time(s) Count 1 sp_spoofRATMP63 73,750.553 89 2 spoofRATMP43 10,195.724 89 3 sp_chkpoint 20.239 12 4 sp_uasqk+ 14.347 1 5 spoofRATMP52 10.496 89 6 ba+* 9.273 15 7 sp_mapmm 1.543 1 8 write 1.291 1 9 / 1.127 92 10 sp_spoofRATMP116 0.930 89
An initial prototype to avoid unnecessary shuffle improved performance from 9305 to 1607s, but additional improvements are possible.