Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-1040

OOME does not cause graceful shutdown under some failure scenarios

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.18.1
    • 0.19.0
    • regionserver
    • None

    Description

      I am seeing these exceptions on our cluster in output from tablemap/tablereduce jobs:

      > java.io.IOException: java.lang.OutOfMemoryError: Java heap space
      > at java.io.DataInputStream.readFull(DataInputSteram.java:175)
      > at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64)
      > at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
      > at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1933)
      > at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1833)
      > at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
      > at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:516)
      > at org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312)

      When such OOMEs as above happen, the cluster does not recover without manual intervention. The regionservers sometimes go down after this, or sometimes do not and stay up in sick condition for a while. Regions go offline and remain unavailable.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              apurtell Andrew Kyle Purtell
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: