Details
Description
In Hadoop, we use native libs for lz4 codec which has several disadvantages:
It requires native libhadoop to be installed in system LD_LIBRARY_PATH, and they have to be installed separately on each node of the clusters, container images, or local test environments which adds huge complexities from deployment point of view. In some environments, it requires compiling the natives from sources which is non-trivial. Also, this approach is platform dependent; the binary may not work in different platform, so it requires recompilation.
It requires extra configuration of java.library.path to load the natives, and it results higher application deployment and maintenance cost for users.
Projects such as Spark use lz4-java which is JNI-based implementation. It contains native binaries in jar file, and it can automatically load the native binaries into JVM from jar without any setup. If a native implementation can not be found for a platform, it can fallback to pure-java implementation of lz4.
Attachments
Issue Links
- breaks
-
HADOOP-17390 Skip license check on lz4 code files
- Resolved
- is related to
-
HADOOP-17399 lz4 sources missing for native Visual Studio project
- Resolved
-
HADOOP-17464 Create hadoop-compression module
- Open
-
HDFS-15690 Add lz4-java as hadoop-hdfs test dependency
- Resolved
- relates to
-
HADOOP-17532 Yarn Job execution get failed when LZ4 Compression Codec is used
- Resolved
-
HADOOP-17891 lz4-java and snappy-java should be excluded from relocation in shaded Hadoop libraries
- Resolved
-
HADOOP-17125 Using snappy-java in SnappyCodec
- Resolved
- links to