Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-1001

CombinedHiveInputFormat should parse the inputpath correctly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.5.0
    • 0.5.0
    • None
    • None
    • Reviewed

    Description

      From David Lerman:
      "
      I'm running into errors where CombinedHiveInputFormat is combining data from
      two different tables which is causing problems because the tables have
      different input formats.

      It looks like the problem is in
      org.apache.hadoop.hive.shims.Hadoop20Shims.getInputPathsShim. It calls
      CombineFileInputFormat.getInputPaths which returns the list of input paths
      and then chops off the first 5 characters to remove file: from the
      beginning, but the return value I'm getting from getInputPaths is actually
      hdfs://domain/path. So then when it creates the pools using these paths,
      none of the input paths match the pools (since they're just the file path
      which protocol or domain).
      "

      We should use Path.getPath() to get the path part of an URI instead of just chopping off 5 chars.

      Attachments

        1. hive.1001.1.patch
          7 kB
          Namit Jain

        Activity

          People

            namit Namit Jain
            zshao Zheng Shao
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: