Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14524

In SparkSQL, it can't be select column of String type because of UTF8String when setting more than 32G for executors.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Duplicate
    • 1.5.2
    • None
    • SQL
    • Centos

    Description

      (related Issue:https://github.com/apache/spark/pull/8210/files)

      When we set 32G(or more) for executor, select the column of String type, it shows the Wrong result, such as:
      'abcde' (less than 8 chars) => '' (it will show nothing)
      'abcdefghijklmn' (more than 8 chars) =>'ijklmn' ( it will cut the the front of 8 chars)

      However, when we set 31G( or less) for executor, all is good.

      We also have debugged this problem, we found that SparkSQL uses UTF8String internally, it depends on some properties of locally JVM Memmory allocation ( see class 'org.apache.spark.unsafe.Platform').

      Attachments

        Issue Links

          Activity

            People

              davies Davies Liu
              tokendeng Deng Changchun
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: