Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
1.9.0, 1.10.0
-
None
-
Using SpecificData in environments with multiple classloaders.
Description
The change introduced in spark 1.9.0 which changed:
SpecificData.get()
into:
SpecificData.getForSchema(schema)
introduced a significant performance degradation in environments where the class of schema is provided by a different classloader then the classloader containing SpecificData.
A possible solution is to use the classCache of the default SpecificData so the sometimes expensive classloader codepath is cached. (PR coming up)
We noticed this in after trying out a spark upstep from spark 3.1.0 (avro 1.8.2) to 3.2.0 (spark 1.10.2) where 74% of the time was spend in millions of times resolving the same class.
With this patch this resolving time was brought back from 74% to 0.70%.
JMC flamegraph showing this issue:
Attachments
Attachments
Issue Links
- blocks
-
SPARK-35744 Performance degradation in avro SpecificRecordBuilders
- Open
- duplicates
-
AVRO-3186 Java: Avro can't decode union<null, logicalType> field of a record
- Resolved
- is related to
-
AVRO-3048 Using builders leads to performance degradation
- Resolved
- links to