Uploaded image for project: 'Apache IoTDB'
  1. Apache IoTDB
  2. IOTDB-4380

The log dispatcher is inconsistent with the search index of the wal node

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.1.0-SNAPSHOT
    • 1.0.0
    • mpp-cluster
    • None
    • 2023-1-Catalyst, 2023-2-Catalyst

    Description

      m_0908_7915b3f。
      问题描述
      log dispatcher与wal node的search index不一致 , datanode重启成功后日志一直刷
      2022-09-09 16:32:00,011 [pool-33-IoTDB-LogDispatcher-DataRegion[12]-2] INFO o.a.i.d.w.n.WALNode$PlanNodeIterator:695 - timeout when waiting for next WAL entry ready, execute rollWALFile. *Current search index in wal buffer is 2959, and next target index is 2501 *

      MultiLeaderConsensus,3副本3节点
      1. 创建元数据过程中,kill ip74 的datanode PID
      benchmark配置文件见附件。
      2. 清空ip74 的操作系统缓存,启动ip74的datanode
      3. 再次重新运行benchmark同一配置,IS_DELETE_DATA=true
      这个参数为true,会先执行delete storage group root.test.*;
      benchmark运行完成,stop ip74的datanode服务
      备份data 为/data/mpp_test/m_0908_7915b3f/datanode/data_for_recovery_Test

      4. 清ip74操作系统缓存,启动datanode服务
      再次运行benchmark同一配置,benchmark运行完成,
      查看ip74的日志,看到
      2022-09-09 15:43:13,691 [pool-23-IoTDB-MPPDataExchangeRPC-Processor-40] ERROR o.a.t.ProcessFunction:47 - Internal error processing getDataBlock
      org.apache.thrift.TException: Source fragment instance not found. Fragment instance ID: TFragmentInstanceId(queryId:20220909_074205_19400_3, fragmentId:2, instanceId:0).
      at org.apache.iotdb.db.mpp.execution.exchange.MPPDataExchangeManager$MPPDataExchangeServiceImpl.getDataBlock(MPPDataExchangeManager.java:90)
      at org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:326)
      at org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:306)
      at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
      at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
      at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
      at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
      at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      at java.base/java.lang.Thread.run(Thread.java:834)
      2022-09-09 15:43:15,312 [20220909_074205_19400_3.2.0.SinkHandle-3074] ERROR o.a.i.d.m.e.e.SinkHandle:281 - The TsBlock doesn't exist. Sequence ID is 1, remaining map is [0=<org.apache.iotdb.tsfile.read.common.block.TsBlock@5f617979,1048576>]
      2022-09-09 15:43:17,119 [pool-23-IoTDB-MPPDataExchangeRPC-Processor-22] ERROR o.a.t.ProcessFunction:47 - Internal error processing getDataBlock
      java.lang.IllegalStateException: The data block doesn't exist. Sequence ID: 1
      at org.apache.iotdb.db.mpp.execution.exchange.SinkHandle.getSerializedTsBlock(SinkHandle.java:285)
      at org.apache.iotdb.db.mpp.execution.exchange.MPPDataExchangeManager$MPPDataExchangeServiceImpl.getDataBlock(MPPDataExchangeManager.java:97)
      at org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:326)
      at org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:306)
      at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
      at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
      at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
      at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
      at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      at java.base/java.lang.Thread.run(Thread.java:834)

      5. 停止ip74的datanode服务
      备份data 到/data/mpp_test/m_0908_7915b3f/datanode/data_for_recovery_Test_2
      清ip74操作系统缓存,启动ip74的datanode ,可以成功,日志一直刷(此节点不能继续同步):
      2022-09-09 16:44:00,039 [pool-33-IoTDB-LogDispatcher-DataRegion[12]-2] INFO o.a.i.d.w.n.WALNode$PlanNodeIterator:695 - timeout when waiting for next WAL entry ready, execute rollWALFile. Current search index in wal buffer is 2959, and next target index is 2501

      机器 与 集群配置
      1. 192.168.10.72/ 73/74 48核384G
      benchmark 在71

      2. 集群参数
      confignode
      MAX_HEAP_SIZE="8G"
      schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
      data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
      schema_replication_factor=3
      data_replication_factor=3

      datanode
      MAX_HEAP_SIZE="256G"
      MAX_DIRECT_MEMORY_SIZE="32G"
      max_connection_for_internal_service=300
      enable_timed_flush_seq_memtable=true
      seq_memtable_flush_interval_in_ms=600000
      seq_memtable_flush_check_interval_in_ms=300000
      enable_timed_flush_unseq_memtable=true
      unseq_memtable_flush_interval_in_ms=600000
      unseq_memtable_flush_check_interval_in_ms=300000
      max_waiting_time_when_insert_blocked=3600000
      query_timeout_threshold=3600000

      Attachments

        1. start_db.sh
          0.6 kB
          刘珍
        2. screenshot-1.png
          169 kB
          刘珍
        3. more_metadata.conf
          14 kB
          刘珍
        4. ip74_logs.tar.gz
          2.12 MB
          刘珍
        5. iotdb_4380.conf
          14 kB
          刘珍
        6. image-2022-11-25-20-57-30-523.png
          111 kB
          刘珍
        7. image-2022-11-25-20-47-33-174.png
          136 kB
          刘珍
        8. exec_iotdb_4380.sh
          2 kB
          刘珍
        9. 20230216_ip74_dn_logs.tar.gz
          55 kB
          刘珍
        10. 20230216_ip72_cn_logs.tar.gz
          126 kB
          刘珍

        Activity

          People

            spricoder Hongyin Zhang
            刘珍 刘珍
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: