[HBASE-1040] OOME does not cause graceful shutdown under some failure scenarios - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.18.1
Fix Version/s: 0.19.0
Component/s: regionserver
Labels:
None

Description

I am seeing these exceptions on our cluster in output from tablemap/tablereduce jobs:

> java.io.IOException: java.lang.OutOfMemoryError: Java heap space
> at java.io.DataInputStream.readFull(DataInputSteram.java:175)
> at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64)
> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
> at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1933)
> at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1833)
> at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
> at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:516)
> at org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312)

When such OOMEs as above happen, the cluster does not recover without manual intervention. The regionservers sometimes go down after this, or sometimes do not and stay up in sick condition for a while. Regions go offline and remain unavailable.

Attachments

Issue Links

relates to

HBASE-1045 Hangup by regionserver causes write to fail

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Andrew Kyle Purtell

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 02/Dec/08 02:01

Updated:: 13/Sep/09 22:26

Resolved:: 04/Dec/08 04:47