[SPARK-11994] Word2VecModel load and save cause SparkException when model is bigger than spark.kryoserializer.buffer.max - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.4.1, 1.5.1
Fix Version/s: 2.0.0
Component/s: MLlib
Labels:
- kryo
- mllib

Description

When loading a Word2VecModel of compressed size 58Mb using the Word2VecModel.load() method introduced in Spark 1.4.0 I get a `org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 2` exception.
This happens because the model is saved as a unique file with no partitioning and the kryo buffer overflows when tries to serialize it all.
Increasing `spark.kryoserializer.buffer.max` works as a temporary solution but needs to increased again whenever we increase the model size.

Attachments

Issue Links

is related to

SPARK-15740 Word2VecSuite "big model load / save" caused OOM in maven jenkins builds

Resolved

relates to

SPARK-6725 Model export/import for Pipeline API (Scala)

Resolved

links to

[Github] Pull Request #9989 (tmnd1991)

Activity

People

Assignee:: Antonio Murgia

Reporter:: Antonio Murgia

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 25/Nov/15 18:02

Updated:: 02/Jun/16 22:16

Resolved:: 05/Dec/15 15:42