[SPARK-34536] zstd-jni lead to read less shuffle data - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 2.4.0, 2.4.7
Fix Version/s: None
Component/s: Spark Core
Labels:
- data-loss

Description

BackGround

I find a rare case which lead some partitions read less data when use zstd;

Detail

I saved normal shuffle data and wrong shuffle data and found the wrong shuffle data was the head part of the normal shuffle data, and I found zstd-jni tag 1.3.3-2 has the problems which can read a head part of whole data and normal exit.

The ZstdInputStream in zstd-jni(tag 1.3.3-2) maybe return 0 after a read function call, this doesn't meet the standard of InputStream, InputStream will not return 0 unless len is 0; Spark will use a BufferedInputStream wrapped to ZstdInputStream, when ZstdInputStream read call return 0, BufferedInputStream will consider the 0 as the end of read and exit, this can lead data loss.

zstd-jni issues:

https://github.com/luben/zstd-jni/issues/159

zstd-jni commits:
https://github.com/luben/zstd-jni/commit/7eec5581b8ccb0d98350ad5ba422337eebbbe70e

zstd-jni has fixed this problems in tag 1.4.4-3, the code as follows: