Description
We are trying to create a split in Hive that will only read files in a directory and not subdirectories.
That fails with the below error.
Given how this error comes about (two pieces of code interact, one explicitly adding directories to results without failing, and one failing on any directories in results), this seems like a bug.
Caused by: java.io.IOException: Not a file: file:/,...warehouse/simple_to_mm_text/delta_0000001_0000001_0000 at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) ~[hadoop-mapreduce-client-core-3.1.0.jar:?] at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
This code, when recursion is disabled, adds directories to results
if (recursive && stat.isDirectory()) { result.dirsNeedingRecursiveCalls.add(stat); } else { result.locatedFileStatuses.add(stat); }
However the getSplits code after that computes the size like this
long totalSize = 0; // compute total size for (FileStatus file: files) { // check we have valid files if (file.isDirectory()) { throw new IOException("Not a file: "+ file.getPath()); } totalSize +=
which would always fail combined with the above code.
Attachments
Attachments
Issue Links
- blocks
-
HIVE-19258 add originals support to MM tables (and make the conversion a metadata only operation)
- Closed
- is related to
-
HIVE-23072 ACID: Can't select from insert-only table with original files and deltas
- Resolved