Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-17292

Using lz4-java in Lz4Codec

    XMLWordPrintableJSON

Details

    • Reviewed
    • Hide
      The Hadoop's LZ4 compression codec now depends on lz4-java. The native LZ4 is performed by the encapsulated JNI and it is no longer necessary to install and configure the lz4 system package.

      The lz4-java is declared in provided scope. Applications that wish to use lz4 codec must declare dependency on lz4-java explicitly.
      Show
      The Hadoop's LZ4 compression codec now depends on lz4-java. The native LZ4 is performed by the encapsulated JNI and it is no longer necessary to install and configure the lz4 system package. The lz4-java is declared in provided scope. Applications that wish to use lz4 codec must declare dependency on lz4-java explicitly.

    Description

      In Hadoop, we use native libs for lz4 codec which has several disadvantages:

      It requires native libhadoop to be installed in system LD_LIBRARY_PATH, and they have to be installed separately on each node of the clusters, container images, or local test environments which adds huge complexities from deployment point of view. In some environments, it requires compiling the natives from sources which is non-trivial. Also, this approach is platform dependent; the binary may not work in different platform, so it requires recompilation.
      It requires extra configuration of java.library.path to load the natives, and it results higher application deployment and maintenance cost for users.
      Projects such as Spark use lz4-java which is JNI-based implementation. It contains native binaries in jar file, and it can automatically load the native binaries into JVM from jar without any setup. If a native implementation can not be found for a platform, it can fallback to pure-java implementation of lz4.

      Attachments

        Issue Links

          Activity

            People

              viirya L. C. Hsieh
              viirya L. C. Hsieh
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 12h
                  12h