Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
2.0.0-alpha
-
None
-
None
Description
Currently it seems possible for CombineFileInputFormat's InputSplit objects to grow to very large sizes due to its non-de-duplication of the locations field. We should probably use a set structure to prevent dupe locations from rising the block locations size of InputSplits sent over by CombineFileInputFormat, as that will help performance and help fix unnecessary warnings/errors over block location limits at the JT/MR AM.
Attachments
Issue Links
- duplicates
-
MAPREDUCE-2021 CombineFileInputFormat returns duplicate hostnames in split locations
- Closed