Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
2.6.0
-
hbase2.6.0
hadoop3.3.6
-
Reviewed
Description
replication queue overstock, As shown below:
In the figure, the first wal file no longer exists, but has not been skipped, causing replciation to block.
the second and third wal file were moved oldWals, you can see the attachment, the reading of these two files faile.
The error log in rs is
2024-07-22T17:47:49,130 WARN [RS_CLAIM_REPLICATION_QUEUE-regionserver/sh2-int-hbase-main-ha-9:16020-0.replicationSource,test_hbase_258-tx1-int-hbase-main-prod-3,16020,1720602522464.replicationSource.wal-reader.tx1-int-hbase-main-prod-3%2C16020%2C1720602522464,test_hbase_258-tx1-int-hbase-main-prod-3,16020,1720602522464] wal.ProtobufWALStreamReader: Error while reading WALKey, originalPosition=0, currentPosition=81
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException$InvalidWireTypeException: Protocol message tag had invalid wire type.
at org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:119) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
at org.apache.hbase.thirdparty.com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:503) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
at org.apache.hbase.thirdparty.com.google.protobuf.GeneratedMessage$Builder.parseUnknownField(GeneratedMessage.java:770) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
at org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:2829) ~[hbase-protocol-shaded-2.6.0.jar:2.6.0]
at org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:4212) ~[hbase-protocol-shaded-2.6.0.jar:2.6.0]
at org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:4204) ~[hbase-protocol-shaded-2.6.0.jar:2.6.0]
at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:192) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:209) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:214) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
at org.apache.hbase.thirdparty.com.google.protobuf.GeneratedMessage.parseWithIOException(GeneratedMessage.java:321) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
at org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey.parseFrom(WALProtos.java:2321) ~[hbase-protocol-shaded-2.6.0.jar:2.6.0]
at org.apache.hadoop.hbase.regionserver.wal.ProtobufWALTailingReader.readWALKey(ProtobufWALTailingReader.java:128) ~[hbase-server-2.6.0.jar:2.6.0]
at org.apache.hadoop.hbase.regionserver.wal.ProtobufWALTailingReader.next(ProtobufWALTailingReader.java:257) ~[hbase-server-2.6.0.jar:2.6.0]
at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.readNextEntryAndRecordReaderPosition(WALEntryStream.java:490) ~[hbase-server-2.6.0.jar:2.6.0]
at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.lastAttempt(WALEntryStream.java:306) ~[hbase-server-2.6.0.jar:2.6.0]
at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:388) ~[hbase-server-2.6.0.jar:2.6.0]
at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:130) ~[hbase-server-2.6.0.jar:2.6.0]
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:153) ~[hbase-server-2.6.0.jar:2.6.0]
2024-07-22T17:48:13,315 WARN [RS-EventLoopGroup-1-65] ipc.NettyRpcConnection: Exception encountered while connecting to the server tx1-int-hbase-main-prod-3:16020
org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: connection timed out after 10000 ms: tx1-int-hbase-main-prod-3/127.0.0.1:16020
at org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:615) ~[hbase-shaded-netty-4.1.7.jar:?]
at org.apache.hbase.thirdparty.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[hbase-shaded-netty-4.1.7.jar:?]
at org.apache.hbase.thirdparty.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153) ~[hbase-shaded-netty-4.1.7.jar:?]
at org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) ~[hbase-shaded-netty-4.1.7.jar:?]
at org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) ~[hbase-shaded-netty-4.1.7.jar:?]
at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) ~[hbase-shaded-netty-4.1.7.jar:?]
at org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:416) ~[hbase-shaded-netty-4.1.7.jar:?]
at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[hbase-shaded-netty-4.1.7.jar:?]
at org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[hbase-shaded-netty-4.1.7.jar:?]
at org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[hbase-shaded-netty-4.1.7.jar:?]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_202]
hbase wal -p error
hbase wal -p hdfs://coreHBaseProdHa/hbase/oldWALs/tx1-int-hbase-main-prod-4%2C16020%2C1720602602602.1720609818921
error is :
2024-07-23 12:36:27,064 INFO [main] hdfs.LocatedBlocksRefresher (LocatedBlocksRefresher.java:<init>(98)) - Start located block refresher for DFSClient default.
Writer Classes: ProtobufLogWriter AsyncProtobufLogWriter SecureProtobufLogWriter SecureAsyncProtobufLogWriter
Cell Codec Class: org.apache.hadoop.hbase.regionserver.wal.WALCellCodec
Exception in thread "main" java.io.EOFException: EOF while reading message size
at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.parseDelimitedFrom(ProtobufUtil.java:3727)
at org.apache.hadoop.hbase.regionserver.wal.ProtobufWALStreamReader.next(ProtobufWALStreamReader.java:56)
at org.apache.hadoop.hbase.wal.WALStreamReader.next(WALStreamReader.java:42)
at org.apache.hadoop.hbase.wal.WALPrettyPrinter.processFile(WALPrettyPrinter.java:297)
at org.apache.hadoop.hbase.wal.WALPrettyPrinter.run(WALPrettyPrinter.java:516)
at org.apache.hadoop.hbase.wal.WALPrettyPrinter.main(WALPrettyPrinter.java:429)