Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2388

Make shim for Hadoop 0.20 and 0.23 support dynamic

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 0.9.2, 0.10.0
    • 0.9.2, 0.10.0
    • None
    • None

    Description

      We need a single Pig installation that works with both Hadoop versions. The current shim implementation assumes different builds for each version. We can solve this statically through internal build/installation system or by making the shim dynamic so that pig.jar will work on both version with runtime detection. Attached patch is to convert the static shims into a shim interface with 2 implementations, each of which will be compiled against the respective Hadoop version and included into single pig.jar (similar to what Hive does).

      The default build behavior remains unchanged, only the shim for ${hadoopversion} will be compiled. Both shims can be built via: ant -Dbuild-all-shims=true

      Attachments

        1. PIG-2388_branch-0.9.patch
          91 kB
          Thomas Weise

        Issue Links

          Activity

            thw Thomas Weise added a comment -

            Even though the common code (outside shim) can be compiled against either of the Hadoop MR versions, it needs to run against the version it was compiled against (due to changes from class to interface in several cases). Whenever we have something like jobcontext.getConfiguration() etc., the bytecode for the method call will be different depending on whether jobcontext is a class or interface (compile time fine, runtime not). Other places like somemethod(JobContext context) don't have that problem. Could get it to work for basic illustrate, but as soon as MR comes into the picture, there are many many places in the common code that are affected and it is not reasonably possible to shim all those.

            Our solution will be an installer that contains set of jar files compiled against both versions and resolve the dependency at startup/install time.

            thw Thomas Weise added a comment - Even though the common code (outside shim) can be compiled against either of the Hadoop MR versions, it needs to run against the version it was compiled against (due to changes from class to interface in several cases). Whenever we have something like jobcontext.getConfiguration() etc., the bytecode for the method call will be different depending on whether jobcontext is a class or interface (compile time fine, runtime not). Other places like somemethod(JobContext context) don't have that problem. Could get it to work for basic illustrate, but as soon as MR comes into the picture, there are many many places in the common code that are affected and it is not reasonably possible to shim all those. Our solution will be an installer that contains set of jar files compiled against both versions and resolve the dependency at startup/install time.
            daijy Daniel Dai added a comment -

            Thanks Thomas, you dig much deep than I do, I cannot even make it compile. Now we need to compile both pig-23.jar and pig-20.jar, and pick the right version in pig script.

            daijy Daniel Dai added a comment - Thanks Thomas, you dig much deep than I do, I cannot even make it compile. Now we need to compile both pig-23.jar and pig-20.jar, and pick the right version in pig script.
            thw Thomas Weise added a comment -

            We are going to solve this through our internal build/packaging process based on the current support. For external distributions, perhaps it makes sense to just offer separate packages per Hadoop versions rather than trying to make a single package that has this added complexity embedded? I assume other projects (Hive etc.) are going to have the same problem.

            thw Thomas Weise added a comment - We are going to solve this through our internal build/packaging process based on the current support. For external distributions, perhaps it makes sense to just offer separate packages per Hadoop versions rather than trying to make a single package that has this added complexity embedded? I assume other projects (Hive etc.) are going to have the same problem.
            thw Thomas Weise added a comment -

            Cannot be fixed w/o major changes to Pig-MR interactions.

            thw Thomas Weise added a comment - Cannot be fixed w/o major changes to Pig-MR interactions.

            Hive does this, and back in the day there was a patch that did this for Pig and hadoop 18 vs hadoop 20.
            Should be doable, though it'll take work..

            dvryaboy Dmitriy V. Ryaboy added a comment - Hive does this, and back in the day there was a patch that did this for Pig and hadoop 18 vs hadoop 20. Should be doable, though it'll take work..
            daijy Daniel Dai added a comment -

            dvryaboy Hive does not has illustrate. It is the illustrate implementation (IllustratorContextImpl) which makes dynamic shims layer hard for Pig.

            daijy Daniel Dai added a comment - dvryaboy Hive does not has illustrate. It is the illustrate implementation (IllustratorContextImpl) which makes dynamic shims layer hard for Pig.

            People

              Unassigned Unassigned
              thw Thomas Weise
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: