Uploaded image for project: 'SystemDS'
  1. SystemDS
  2. SYSTEMDS-1279

EOFException in MinMaxMean example snippet

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • SystemML 0.13
    • None
    • None

    Description

      Our current documentation contains a snippet for a short DML scipt:

      val numRows = 10000
      val numCols = 1000
      val data = sc.parallelize(0 to numRows-1).map { _ => Row.fromSeq(Seq.fill(numCols)(Random.nextDouble)) }
      val schema = StructType((0 to numCols-1).map { i => StructField("C" + i, DoubleType, true) } )
      val df = spark.createDataFrame(data, schema)
      
      
      
      val minMaxMean =
      """
      minOut = min(Xin)
      maxOut = max(Xin)
      meanOut = mean(Xin)
      """
      val mm = new MatrixMetadata(numRows, numCols)
      val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", "maxOut", "meanOut")
      val (min, max, mean) = ml.execute(minMaxMeanScript).getTuple[Double, Double, Double]("minOut", "maxOut", "meanOut")
      

      Execution of the line

      val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", "maxOut", "meanOut")
      

      in the spark-shell leads to the following error:

      scala> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", "maxOut", "meanOut")
      [Stage 0:>                                                          (0 + 4) / 4]17/02/16 13:37:10 WARN CodeGenerator: Error calculating stats of compiled class.
      java.io.EOFException
      	at java.io.DataInputStream.readFully(DataInputStream.java:197)
      	at java.io.DataInputStream.readFully(DataInputStream.java:169)
      	at org.codehaus.janino.util.ClassFile.loadAttribute(ClassFile.java:1509)
      	at org.codehaus.janino.util.ClassFile.loadAttributes(ClassFile.java:644)
      	at org.codehaus.janino.util.ClassFile.loadFields(ClassFile.java:623)
      	at org.codehaus.janino.util.ClassFile.<init>(ClassFile.java:280)
      	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:967)
      	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:964)
      	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
      	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
      	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
      	at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
      	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.recordCompilationStats(CodeGenerator.scala:964)
      	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:936)
      	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:998)
      	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:995)
      	at org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
      	at org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
      	at org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
      	at org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
      	at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
      	at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
      	at org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
      	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:890)
      	at org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:405)
      	at org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:359)
      	at org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:32)
      	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:874)
      	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection$lzycompute(ExpressionEncoder.scala:266)
      	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection(ExpressionEncoder.scala:266)
      	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:290)
      	at org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:547)
      	at org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:547)
      	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
      	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
      	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
      	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
      	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
      	at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1762)
      	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
      	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
      	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
      	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
      	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
      	at org.apache.spark.scheduler.Task.run(Task.scala:99)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      minMaxMeanScript: org.apache.sysml.api.mlcontext.Script =                       
      Inputs:
        [1] (Dataset as Matrix) Xin: [C0: double, C1: double ... 998 more fields]
      
      Outputs:
        [1] minOut
        [2] maxOut
        [3] meanOut
      

      It seems like it still evaluates the expression correctly and it might be a Spark codegen issue. When setting the number of columns to 100 the exception does not occur and for subsequent evaluations it also doesn't occur.

      Attachments

        Activity

          fschueler Felix Schueler added a comment - Might be related to https://issues.apache.org/jira/browse/SPARK-17131
          fschueler Felix Schueler added a comment -

          For the sake of having a clean documentation I suggest to set numCols to 100 for now until it is fixed in spark.

          fschueler Felix Schueler added a comment - For the sake of having a clean documentation I suggest to set numCols to 100 for now until it is fixed in spark.

          fschueler I am going to resolve this since your PR395 addressed the codegen warning generated by following the docs.

          SYSTEMML-1267 is a duplicate of this issue but perhaps I will keep that open for now in case mboehm7 or someone else can figure out any workaround.

          deron Jon Deron Eriksson added a comment - fschueler I am going to resolve this since your PR395 addressed the codegen warning generated by following the docs. SYSTEMML-1267 is a duplicate of this issue but perhaps I will keep that open for now in case mboehm7 or someone else can figure out any workaround.
          fschueler Felix Schueler added a comment -

          Oh, didn't see that! Thanks!

          fschueler Felix Schueler added a comment - Oh, didn't see that! Thanks!

          People

            fschueler Felix Schueler
            fschueler Felix Schueler
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: