Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2952

TServers reporting replica stats may race with leadership change, hitting a DCHECK

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.11.0
    • consensus, tserver
    • None

    Description

      I have a precommit that failed with:

      F0924 00:08:46.821594  9670 catalog_manager.cc:4239] Check failed: ts_desc->permanent_uuid() == report.consensus_state().leader_uuid() 
      *** Check failure stack trace: ***
          @     0x7f5e442ea62d  google::LogMessage::Fail() at ??:0
          @     0x7f5e442ec64c  google::LogMessage::SendToLog() at ??:0
          @     0x7f5e442ea189  google::LogMessage::Flush() at ??:0
          @     0x7f5e442ecfdf  google::LogMessageFatal::~LogMessageFatal() at ??:0
          @     0x7f5e45d89a01  kudu::master::CatalogManager::ProcessTabletReport() at ??:0
          @     0x7f5e45e29ae7  kudu::master::MasterServiceImpl::TSHeartbeat() at ??:0
          @     0x7f5e41f29cbc  _ZZN4kudu6master15MasterServiceIfC1ERK13scoped_refptrINS_12MetricEntityEERKS2_INS_3rpc13ResultTrackerEEENKUlPKN6google8protobuf7MessageEPSE_PNS7_10RpcContextEE0_clESG_SH_SJ_ at ??:0
          @     0x7f5e41f3068b  _ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZNS6_6master15MasterServiceIfC1ERK13scoped_refptrINS6_12MetricEntityEERKSD_INS7_13ResultTrackerEEEUlS4_S5_S9_E0_E9_M_invokeERKSt9_Any_dataS4_S5_S9_ at ??:0
          @     0x7f5e3fea909e  std::function<>::operator()() at ??:0
          @     0x7f5e3fea88cf  kudu::rpc::GeneratedServiceIf::Handle() at ??:0
          @     0x7f5e3feab3b6  kudu::rpc::ServicePool::RunThread() at ??:0
          @     0x7f5e3feac785  boost::_mfi::mf0<>::operator()() at ??:0
          @     0x7f5e3feac5ac  boost::_bi::list1<>::operator()<>() at ??:0
          @     0x7f5e3feac493  boost::_bi::bind_t<>::operator()() at ??:0
          @     0x7f5e3feac3c2  boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0
          @     0x7f5e44db28d2  boost::function0<>::operator()() at ??:0
          @     0x7f5e44daf65b  kudu::Thread::SuperviseThread() at ??:0
          @     0x7f5e41429184  start_thread at ??:0
          @     0x7f5e438f4ffd  clone at ??:0 
      
      

      Looking through the code, it looks like there's a kind of TOCTOU race going on when generating reports.

      Attachments

        1. master_hms-itest.txt
          712 kB
          Andrew Wong

        Activity

          People

            awong Andrew Wong
            awong Andrew Wong
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: