Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
3.5.0
-
None
-
None
Description
Upgrading to Spark 3.5.0 introduced a regression for us where our query gateway (Livy) fails with an error:
com.fasterxml.jackson.core.exc.StreamConstraintsException: String length (20054016) exceeds the maximum length (20000000)
(sorry, unable to provide full stack trace)
The root of this problem is the breaking change in jackson that (in the name of "safety") introduced some JSON size limits, see: https://github.com/FasterXML/jackson-core/issues/1014
Looks like JSONOptions in Spark already support configuring this limit, but there seems to be no way to set it globally or pass it down to DataFrame::toJSON() which our Apache Livy server is using when transmitting data.
Livy is an old project and transferring dataframes via JSON is super inefficient, and we really should move to something like Spark Connect, but I believe this issue can happen to many people working with basic GeoJSON data.
Spark can handle very large strings, and this arbitrary limit just gets in a way of output serialization for no good reason.