Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.5.0
-
None
-
None
-
Reviewed
Description
From David Lerman:
"
I'm running into errors where CombinedHiveInputFormat is combining data from
two different tables which is causing problems because the tables have
different input formats.
It looks like the problem is in
org.apache.hadoop.hive.shims.Hadoop20Shims.getInputPathsShim. It calls
CombineFileInputFormat.getInputPaths which returns the list of input paths
and then chops off the first 5 characters to remove file: from the
beginning, but the return value I'm getting from getInputPaths is actually
hdfs://domain/path. So then when it creates the pools using these paths,
none of the input paths match the pools (since they're just the file path
which protocol or domain).
"
We should use Path.getPath() to get the path part of an URI instead of just chopping off 5 chars.