Details
Description
FileSystem Globber does a listStatus(path) and then, if only one element is returned, getFileStatus(path).isDirectory() to see if it is a dir. The way getFileStatus() is wrapped, IOEs are downgraded to null
On S3, if the path has had entries deleted, the listing may include files which are no longer there, so the getFileStatus(path),isDirectory triggers an NPE
While its wrong to glob against S3 when its being inconsistent, we should at least fail gracefully here.
Proposed
- log all IOEs raised in Globber.getFileStatus @ debug
- catch FNFEs and downgrade to warn
- continue
The alternative would be fail fast on FNFE, but that's more traumatic
Attachments
Attachments
Issue Links
- is related to
-
HADOOP-16458 LocatedFileStatusFetcher scans failing intermittently against S3 store
- Resolved
-
HADOOP-8870 NullPointerException when glob doesn't return files
- Closed