Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Resolved
-
None
-
None
Description
There is a long history about the LZ4 interoperability of parquet files between parquet-mr and parquet-cpp (which is now in the Apache Arrow). Attached links are the evidence. In short, a new LZ4_RAW codec type has been introduced since parquet format v2.9.0. However, only parquet-cpp supports LZ4_RAW. The parquet-mr library still uses the old Hadoop-provided LZ4 codec and cannot read parquet files with LZ4_RAW.
Attachments
Issue Links
- is related to
-
SPARK-43273 Support lz4raw compression codec for Parquet
- Resolved
- relates to
-
PARQUET-1974 LZ4 decoding is not working over hadoop
- Open
-
ARROW-9177 [C++][Parquet] Tracking issue for cross-implementation LZ4 Parquet compression compatibility
- Resolved
-
PARQUET-1878 [C++] lz4 codec is not compatible with Hadoop Lz4Codec
- Resolved
-
PARQUET-1996 [Format] Add interoperable LZ4 codec, deprecate existing LZ4 codec
- Resolved