Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
In the presence of many input splits (>6000 in our case) and input split threads (3000), the loop that fetches locality info for all splits from ZooKeeper becomes a bottleneck. A few workers aren't able to even iterate once over the list, run into increased GC pauses, and eventually time out.
Furthermore, depending on the cluster configuration, it's not always possible/useful to exploit locality.
We should add a flag so that the feature can be optionally disabled.