Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-10799

SCMBlockDeletingService stuck in PAUSING state

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.4.0
    • None
    • SCM
    • None

    Description

      SCM has a number of internal services (they implement the org.apache.hadoop.hdds.scm.ha.SCMService interface). The interface has a method for notifying the services about changes in raft or in safe mode. On testing the blocks deletion service a strange behavior was detected:

      • transactions flushed to DB (i.e. snapshots was taken)
      • containers are closed
      • BUT transactions aren't sent to DNs - and we have a number of mlns of non-handled blocks deletion transactions

      After an investigation of the problem it appears that the event of exiting of the SCM from a safe mode was triggered multiple times, and eventually the SCMBlockDeletingService was moved to PAUSING state:

      org.apache.hadoop.hdds.scm.block.SCMBlockDeletingService#notifyStatusChanged
        public void notifyStatusChanged() {
          serviceLock.lock();
          try {
            if (scmContext.isLeaderReady() && !scmContext.isInSafeMode() &&
                serviceStatus != ServiceStatus.RUNNING) {
              safemodeExitMillis = clock.millis();
              serviceStatus = ServiceStatus.RUNNING;
            } else {
              serviceStatus = ServiceStatus.PAUSING;
            }
          } finally {
            serviceLock.unlock();
          }
        }
      
      • 1st trigger: SCM is LEADER, SCM is NOT in safe mode, the service is NOT in RUNNING state -> the service has been transitioned to RUNNING state
      • 2nd trigger: SCM is LEADER, SCM is NOT in safe mode, the service IS in RUNNING state (as a result ofthe 1st trigger) -> the service has been transitioned to PAUSING state

      Attachments

        Issue Links

          Activity

            People

              vtutrinov Vyacheslav Tutrinov
              vtutrinov Vyacheslav Tutrinov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: