Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-549

Don't CLOSE region if message is not from server that opened it or is opening it

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Invalid
    • 0.16.0, 0.1.0, 0.1.1, 0.2.0
    • None
    • None
    • None

    Description

      We assign a region to a server. It takes too long to open (HBASE-505). Region gets assigned to another server. Meantime original host returns a MSG_REPORT_CLOSE (because other regions opening messes it up moving files on disk out from under it). We queue a shutdown which marks the region as needing reassignment. Second server reports in that it successfully opened the region. Master tells it it should not have opened it. Churn ensues.

      Fix is to ignore the CLOSE if its reported server/startcode does not match that of the server currently trying to open region. Fix is not easy because currently we don't keep list of server info in unassigned regions.

      Here's master log snippet showing problem:

      ...
      2008-03-25 19:16:43,711 INFO org.apache.hadoop.hbase.HMaster: assigning region enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 to server XX.XX.XX.220:60020
      2008-03-25 19:16:46,725 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_PROCESS_OPEN : enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 from XX.XX.XX.220:60020
      2008-03-25 19:18:06,411 DEBUG org.apache.hadoop.hbase.HMaster: shutdown scanner looking at enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
      2008-03-25 19:18:06,811 DEBUG org.apache.hadoop.hbase.HMaster: shutdown scanner looking at enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
      2008-03-25 19:19:46,841 INFO org.apache.hadoop.hbase.HMaster: assigning region enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 to server XX.XX.XX.221:60020
      2008-03-25 19:19:49,849 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_PROCESS_OPEN : enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 from XX.XX.XX.221:60020
      2008-03-25 19:19:56,883 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_CLOSE : enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 from XX.XX.XX.220:60020
      2008-03-25 19:19:56,883 INFO org.apache.hadoop.hbase.HMaster: XX.XX.XX.220:60020 no longer serving regionname: enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482, startKey: <iLStZ0yTnfVUziYcNVVxWV==>, endKey: <jLB27Q4hKls4tSvp64rJfF==
      >, encodedName: 1857033608, tableDesc: {name: enwiki_080103, families: {alternate_title:={name: alternate_title, max versions: 3, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, alternate_url:={name: al
      ternate_url, max versions: 3, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, anchor:={name: anchor, max versions: 3, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, mi
      sc:={name: misc, max versions: 3, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, page:={name: page, max versions: 3, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, re
      direct:={name: redirect, max versions: 3, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}}
      2008-03-25 19:19:56,885 DEBUG org.apache.hadoop.hbase.HMaster: Main processing loop: ProcessRegionClose of enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482, true, false
      2008-03-25 19:19:56,885 INFO org.apache.hadoop.hbase.HMaster: region closed: enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
      2008-03-25 19:19:56,887 INFO org.apache.hadoop.hbase.HMaster: reassign region: enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
      2008-03-25 19:19:57,288 INFO org.apache.hadoop.hbase.HMaster: assigning region enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 to server XX.XX.XX.189:60020
      2008-03-25 19:20:00,296 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_PROCESS_OPEN : enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 from XX.XX.XX.189:60020
      2008-03-25 19:20:16,885 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_OPEN : enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 from XX.XX.XX.221:60020
      2008-03-25 19:20:16,885 DEBUG org.apache.hadoop.hbase.HMaster: region server XX.XX.XX.221:60020 should not have opened region enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
      2008-03-25 19:23:51,707 DEBUG org.apache.hadoop.hbase.HMaster: shutdown scanner looking at enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
      2008-03-25 19:23:51,834 DEBUG org.apache.hadoop.hbase.HMaster: shutdown scanner looking at enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
      2008-03-25 19:23:53,947 INFO org.apache.hadoop.hbase.HMaster: assigning region enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 to server XX.XX.XX.97:60020
      ...
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              stack Michael Stack
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: