Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
2.7.3
-
None
-
None
Description
FileInputFormat.singleThreadedListStatus does recursive directory walks to pick files to scan. This is very inefficient on object stores, and can be bypassed if listFiles(recursive=true) can be used instead.
Based on the experience of SPARK-2984, it should also be resilient to a source file going away during the iteration, downgrading an FNFE to a "skip that nonexistent path"
Attachments
Issue Links
- is depended upon by
-
HADOOP-13525 Optimize uses of FS operations in the ASF analysis frameworks and libraries
- Resolved
- relates to
-
HIVE-21546 hiveserver2 - “mapred.FileInputFormat: Total input files to process” - why single threaded?
- Open
-
SPARK-2984 FileNotFoundException on _temporary directory
- Resolved