[MAPREDUCE-6633] AM should retry map attempts if the reduce task encounters commpression related errors. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.7.2
Fix Version/s: 2.8.0, 2.7.3, 3.0.0-alpha1
Component/s: None
Labels:
None

Target Version/s:

2.8.0, 2.7.3
Hadoop Flags:

Reviewed

Description

When reduce task encounters compression related errors, AM doesn't retry the corresponding map task.
In one of the case we encountered, here is the stack trace.

2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#29
	at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ArrayIndexOutOfBoundsException
	at com.hadoop.compression.lzo.LzoDecompressor.setInput(LzoDecompressor.java:196)
	at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104)
	at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
	at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
	at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97)
	at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:537)
	at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336)
	at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)

In this case, the node on which the map task ran had a bad drive.
If the AM had retried running that map task somewhere else, the job definitely would have succeeded.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAPREDUCE-6633.patch
25/Mar/16 19:26
4 kB
Rushabh Shah

Issue Links

is related to

TEZ-3833 Tasks should report codec errors during shuffle as fetch failures

Closed

Activity

People

Assignee:: Rushabh Shah

Reporter:: Rushabh Shah

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 10/Feb/16 21:12

Updated:: 14/Oct/19 15:38

Resolved:: 09/Apr/16 19:21