[TEZ-2719] Consider reducing logs in unordered fetcher with shared-fetch option - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.8.0-alpha, 0.7.1, 0.5.5, 0.6.3
Component/s: None
Labels:
None

Target Version/s:

0.8.0-alpha
Hadoop Flags:

Reviewed

Description

For large broadcast, this can be a problem
e.g
In one of the jobs (query_17 @ 10 TB scale), Map 7 generates around 1.1 GB of data which is given to 330 tasks in downstream Map 1.

Map 1 uses all slots in cluster (~ 224 per wave). Until data is downloaded, shared fetch would end up re-queuing fetches. As a part of it, it would end up printing 3 logs per attempt. E.g

2015-08-14 02:09:11,761 INFO [Fetcher [Map_7] #0] shuffle.Fetcher: Requeuing machine1:13562 downloads because we didn't get a lock
2015-08-14 02:09:11,761 INFO [Fetcher [Map_7] #0] shuffle.Fetcher: Shared fetch failed to return 1 inputs on this try
2015-08-14 02:09:11,761 INFO [ShuffleRunner [Map_7]] impl.ShuffleManager: Scheduling fetch for inputHost: machine1:13562
2015-08-14 02:09:11,761 INFO [ShuffleRunner [Map_7]] impl.ShuffleManager: Created Fetcher for host: machine1 with inputs: [InputAttemptIdentifier [inputIdentifier=InputIdentifier [inputIndex=0], attemptNumber=0, pathComponent=attempt_1439264591968_0058_1_04_000000_0_10029, fetchTypeInfo=FINAL_MERGE_ENABLED, spillEventId=-1]]

Based on disk / network, it might take time for fetcher to finish downloading and release the lock. Since there was only one task in Map-1, it ended up in a sort of tight loop generating relatively larger logs.

Looks like 260-290 MB task log files are created in this case per attempt. That would be around 2.3 GB to 3 GB (depending on number of slots waiting) in machine with 8-10 slots.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

TEZ-2719.branch-0.7.patch
14/Aug/15 06:02
5 kB
Rajesh Balamohan
TEZ-2719.3.patch
14/Aug/15 18:37
6 kB
Hitesh Shah
TEZ-2719.2.patch
14/Aug/15 06:02
6 kB
Rajesh Balamohan
TEZ-2719.1.patch
14/Aug/15 04:37
6 kB
Rajesh Balamohan

Activity

People

Assignee:: Rajesh Balamohan

Reporter:: Rajesh Balamohan

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 14/Aug/15 04:34

Updated:: 01/Sep/15 21:31

Resolved:: 16/Aug/15 23:03