Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35744

Performance degradation in avro SpecificRecordBuilders

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.2.0
    • None
    • Spark Core
    • None

    Description

      Creating this bug to let you know that when we tested out spark 3.2.0 we saw a significant performance degradation where our code was handling Avro Specific Record objects.  This slowed down some of our jobs with a factor 4.

      Spark 3.2.0 upsteps the avro version from 1.8.2 to 1.10.2.

      The degradation was caused by a change introduced in avro 1.9.0.  This change degrades performance when creating avro specific records in certain classloader topologies, like the ones used in spark.

      We notified and proposed a simple fix upstream in the avro project.  (Links contain more details)

      It is unclear for us how many other projects are using avro specific records in a spark context and will be impacted by this degradation.
      Feel free to close this issue if you think this issue is too much of a corner case.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              steven.aerts Steven Aerts
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: