Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.1.0-SNAPSHOT
-
None
-
2023-1-Catalyst, 2023-2-Catalyst
Description
m_0908_7915b3f。
问题描述
log dispatcher与wal node的search index不一致 , datanode重启成功后日志一直刷:
2022-09-09 16:32:00,011 [pool-33-IoTDB-LogDispatcher-DataRegion[12]-2] INFO o.a.i.d.w.n.WALNode$PlanNodeIterator:695 - timeout when waiting for next WAL entry ready, execute rollWALFile. *Current search index in wal buffer is 2959, and next target index is 2501 *
MultiLeaderConsensus,3副本3节点
1. 创建元数据过程中,kill ip74 的datanode PID
benchmark配置文件见附件。
2. 清空ip74 的操作系统缓存,启动ip74的datanode
3. 再次重新运行benchmark同一配置,IS_DELETE_DATA=true
这个参数为true,会先执行delete storage group root.test.*;
benchmark运行完成,stop ip74的datanode服务
备份data 为/data/mpp_test/m_0908_7915b3f/datanode/data_for_recovery_Test
4. 清ip74操作系统缓存,启动datanode服务
再次运行benchmark同一配置,benchmark运行完成,
查看ip74的日志,看到
2022-09-09 15:43:13,691 [pool-23-IoTDB-MPPDataExchangeRPC-Processor-40] ERROR o.a.t.ProcessFunction:47 - Internal error processing getDataBlock
org.apache.thrift.TException: Source fragment instance not found. Fragment instance ID: TFragmentInstanceId(queryId:20220909_074205_19400_3, fragmentId:2, instanceId:0).
at org.apache.iotdb.db.mpp.execution.exchange.MPPDataExchangeManager$MPPDataExchangeServiceImpl.getDataBlock(MPPDataExchangeManager.java:90)
at org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:326)
at org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:306)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
2022-09-09 15:43:15,312 [20220909_074205_19400_3.2.0.SinkHandle-3074] ERROR o.a.i.d.m.e.e.SinkHandle:281 - The TsBlock doesn't exist. Sequence ID is 1, remaining map is [0=<org.apache.iotdb.tsfile.read.common.block.TsBlock@5f617979,1048576>]
2022-09-09 15:43:17,119 [pool-23-IoTDB-MPPDataExchangeRPC-Processor-22] ERROR o.a.t.ProcessFunction:47 - Internal error processing getDataBlock
java.lang.IllegalStateException: The data block doesn't exist. Sequence ID: 1
at org.apache.iotdb.db.mpp.execution.exchange.SinkHandle.getSerializedTsBlock(SinkHandle.java:285)
at org.apache.iotdb.db.mpp.execution.exchange.MPPDataExchangeManager$MPPDataExchangeServiceImpl.getDataBlock(MPPDataExchangeManager.java:97)
at org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:326)
at org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:306)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
5. 停止ip74的datanode服务
备份data 到/data/mpp_test/m_0908_7915b3f/datanode/data_for_recovery_Test_2
清ip74操作系统缓存,启动ip74的datanode ,可以成功,日志一直刷(此节点不能继续同步):
2022-09-09 16:44:00,039 [pool-33-IoTDB-LogDispatcher-DataRegion[12]-2] INFO o.a.i.d.w.n.WALNode$PlanNodeIterator:695 - timeout when waiting for next WAL entry ready, execute rollWALFile. Current search index in wal buffer is 2959, and next target index is 2501
机器 与 集群配置
1. 192.168.10.72/ 73/74 48核384G
benchmark 在71
2. 集群参数
confignode
MAX_HEAP_SIZE="8G"
schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
schema_replication_factor=3
data_replication_factor=3
datanode
MAX_HEAP_SIZE="256G"
MAX_DIRECT_MEMORY_SIZE="32G"
max_connection_for_internal_service=300
enable_timed_flush_seq_memtable=true
seq_memtable_flush_interval_in_ms=600000
seq_memtable_flush_check_interval_in_ms=300000
enable_timed_flush_unseq_memtable=true
unseq_memtable_flush_interval_in_ms=600000
unseq_memtable_flush_check_interval_in_ms=300000
max_waiting_time_when_insert_blocked=3600000
query_timeout_threshold=3600000