Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47150

String length (...) exceeds the maximum length (20000000)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.5.0
    • None
    • Input/Output
    • None

    Description

      Upgrading to Spark 3.5.0 introduced a regression for us where our query gateway (Livy) fails with an error:

      com.fasterxml.jackson.core.exc.StreamConstraintsException: String length (20054016) exceeds the maximum length (20000000)
      
      (sorry, unable to provide full stack trace)

      The root of this problem is the breaking change in jackson that (in the name of "safety") introduced some JSON size limits, see: https://github.com/FasterXML/jackson-core/issues/1014

      Looks like JSONOptions in Spark already support configuring this limit, but there seems to be no way to set it globally or pass it down to DataFrame::toJSON() which our Apache Livy server is using when transmitting data.

      Livy is an old project and transferring dataframes via JSON is super inefficient, and we really should move to something like Spark Connect, but I believe this issue can happen to many people working with basic GeoJSON data.

      Spark can handle very large strings, and this arbitrary limit just gets in a way of output serialization for no good reason.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sergiimk Sergii Mikhtoniuk
            Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: