Description
HADOOP-13208 produces O(1) listing of directory trees in FileSystem.listStatus calls, but doesn't do anything for FileSystem.globStatus(), which uses a completely different codepath, one which does a selective recursive scan by pattern matching as it goes down, filtering out those patterns which don't match. Cost is O(matching-directories) + cost of examining the files.
It should be possible to do the glob status listing in S3A not through the filtered treewalk, but through a list + filter operation. This would be an O(files) lookup before any filtering took place.
Attachments
Issue Links
- depends upon
-
HADOOP-13208 S3A listFiles(recursive=true) to do a bulk listObjects instead of walking the pseudo-tree of directories
- Resolved
- is related to
-
HADOOP-14235 S3A Path does not understand colon (:) when globbing
- Resolved
- is superceded by
-
HADOOP-16458 LocatedFileStatusFetcher scans failing intermittently against S3 store
- Resolved
- links to