Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1152

ArrayStoreException on mapping RDD on cluster

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Duplicate
    • 0.9.0
    • None
    • Spark Core
    • None

    Description

      With this code:

      import org.apache.spark.{SparkConf, SparkContext, Partitioner}
      import org.apache.spark.SparkContext._
      
      object twitterAggregation extends App {
      
        val conf = new SparkConf()
            .setMaster("spark://ec2-x-x-x-x.compute-1.amazonaws.com:7077")
            //.setMaster("local")
            .setAppName("foo")
            .setJars(List("target/scala-2.10/foo_2.10-0.0.1.jar"))
            .setSparkHome("/root/spark/")
        val sc = new SparkContext(conf)
        sc.parallelize(Seq("b")).map(identity).collect
      }
      

      I get this:

      14/02/28 18:41:10 WARN scheduler.TaskSetManager: Lost TID 1 (task 0.0:1)
      14/02/28 18:41:10 INFO scheduler.DAGScheduler: Completed ResultTask(0, 0)
      14/02/28 18:41:10 WARN scheduler.TaskSetManager: Loss was due to java.lang.ArrayStoreException
      java.lang.ArrayStoreException: [Ljava.lang.String;
      	at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:88)
      	at scala.Array$.slowcopy(Array.scala:81)
      	at scala.Array$.copy(Array.scala:107)
      	at scala.collection.mutable.ResizableArray$class.copyToArray(ResizableArray.scala:77)
      	at scala.collection.mutable.ArrayBuffer.copyToArray(ArrayBuffer.scala:47)
      	at scala.collection.TraversableOnce$class.copyToArray(TraversableOnce.scala:241)
      	at scala.collection.AbstractTraversable.copyToArray(Traversable.scala:105)
      	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:249)
      	at scala.collection.AbstractTraversable.toArray(Traversable.scala:105)
      	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
      	at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
      	at org.apache.spark.rdd.RDD$$anonfun$4.apply(RDD.scala:602)
      	at org.apache.spark.rdd.RDD$$anonfun$4.apply(RDD.scala:602)
      	at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:884)
      	at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:884)
      	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
      	at org.apache.spark.scheduler.Task.run(Task.scala:53)
      	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
      	at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:744)
      

      This only happens when running against a cluster as an app (sbt package && sbt play).

      With a master of "local", or running on the spark shell on a cluster, code runs without error.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              andrewkerr Andrew Kerr
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: