Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17356

A large Metadata filed in Alias can cause OOM when calling TreeNode.toJSON

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.6.3, 2.0.1, 2.1.0
    • SQL
    • None

    Description

      When using MLLib, when calling toJSON on a plan with many level of sub-queries, it may cause out of memory exception with stack trace like this

      java.lang.OutOfMemoryError: GC overhead limit exceeded
      	at scala.collection.mutable.AbstractSeq.<init>(Seq.scala:47)
      	at scala.collection.mutable.AbstractBuffer.<init>(Buffer.scala:48)
      	at scala.collection.mutable.ListBuffer.<init>(ListBuffer.scala:46)
      	at scala.collection.immutable.List$.newBuilder(List.scala:396)
      	at scala.collection.generic.GenericTraversableTemplate$class.newBuilder(GenericTraversableTemplate.scala:64)
      	at scala.collection.AbstractTraversable.newBuilder(Traversable.scala:105)
      	at scala.collection.TraversableLike$class.filter(TraversableLike.scala:262)
      	at scala.collection.AbstractTraversable.filter(Traversable.scala:105)
      	at scala.collection.TraversableLike$class.filterNot(TraversableLike.scala:274)
      	at scala.collection.AbstractTraversable.filterNot(Traversable.scala:105)
      	at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:25)
      	at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:20)
      	at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:25)
      	at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:25)
      	at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:25)
      	at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:25)
      	at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:20)
      	at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:20)
      	at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:25)
      	at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:20)
      	at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:7)
      	at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
      	at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
      	at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
      	at org.json4s.jackson.JsonMethods$class.compact(JsonMethods.scala:34)
      	at org.json4s.jackson.JsonMethods$.compact(JsonMethods.scala:50)
      	at org.apache.spark.sql.catalyst.trees.TreeNode.toJSON(TreeNode.scala:566)
      

      The query plan, stack trace, and jmap distribution is attached.

      Attachments

        1. jmap.txt
          487 kB
          Sean Zhong
        2. jstack.txt
          121 kB
          Sean Zhong
        3. queryplan.txt
          16 kB
          Sean Zhong

        Activity

          People

            clockfly Sean Zhong
            clockfly Sean Zhong
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: