Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1860

ksck doesn't identify tablets that are evicted but still in config

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.2.0
    • 1.4.0
    • ksck, ops-tooling
    • None

    Description

      As reported by a user on Slack, ksck can give you a wrong output such as:

        ca199fafca544df2a1b2a01be9d5266d (server1:7250): RUNNING [LEADER]
        a077957f627c4758ab5a989aca8a1ca8 (server2:7250): RUNNING
        5c09a555c205482b8131f15b2c249ec6 (server3:7250): bad state
          State:       NOT_STARTED
          Data state:  TABLET_DATA_TOMBSTONED
          Last status: Tablet initializing...
      

      The problem is that server2 was already evicted out of the configuration (based on reading the logs) but it wasn't committed in the config (which contains server 1 and 3) since there's really only 1 server left out of 3.

      Ideally ksck should try to see what each server thinks the configuration is and see if there's a difference from what's in the master. As it is, it looks like we're missing 1 replica but in reality this is a broken tablet.

      Attachments

        Activity

          People

            wdberkeley William Berkeley
            jdcryans Jean-Daniel Cryans
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: