Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-28748

Replication blocking: InvalidProtocolBufferException$InvalidWireTypeException: Protocol message tag had invalid wire type.

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      replication queue overstock, As shown below:

       

      In the figure, the first wal file no longer exists, but has not been skipped, causing replciation to block.

      the second and third wal file were moved oldWals, you can see the attachment, the reading of these two files faile.

      The error log in rs is

      2024-07-22T17:47:49,130 WARN [RS_CLAIM_REPLICATION_QUEUE-regionserver/sh2-int-hbase-main-ha-9:16020-0.replicationSource,test_hbase_258-tx1-int-hbase-main-prod-3,16020,1720602522464.replicationSource.wal-reader.tx1-int-hbase-main-prod-3%2C16020%2C1720602522464,test_hbase_258-tx1-int-hbase-main-prod-3,16020,1720602522464] wal.ProtobufWALStreamReader: Error while reading WALKey, originalPosition=0, currentPosition=81
      org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException$InvalidWireTypeException: Protocol message tag had invalid wire type.
      at org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:119) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
      at org.apache.hbase.thirdparty.com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:503) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
      at org.apache.hbase.thirdparty.com.google.protobuf.GeneratedMessage$Builder.parseUnknownField(GeneratedMessage.java:770) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
      at org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:2829) ~[hbase-protocol-shaded-2.6.0.jar:2.6.0]
      at org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:4212) ~[hbase-protocol-shaded-2.6.0.jar:2.6.0]
      at org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:4204) ~[hbase-protocol-shaded-2.6.0.jar:2.6.0]
      at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:192) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
      at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:209) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
      at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:214) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
      at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
      at org.apache.hbase.thirdparty.com.google.protobuf.GeneratedMessage.parseWithIOException(GeneratedMessage.java:321) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
      at org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey.parseFrom(WALProtos.java:2321) ~[hbase-protocol-shaded-2.6.0.jar:2.6.0]
      at org.apache.hadoop.hbase.regionserver.wal.ProtobufWALTailingReader.readWALKey(ProtobufWALTailingReader.java:128) ~[hbase-server-2.6.0.jar:2.6.0]
      at org.apache.hadoop.hbase.regionserver.wal.ProtobufWALTailingReader.next(ProtobufWALTailingReader.java:257) ~[hbase-server-2.6.0.jar:2.6.0]
      at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.readNextEntryAndRecordReaderPosition(WALEntryStream.java:490) ~[hbase-server-2.6.0.jar:2.6.0]
      at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.lastAttempt(WALEntryStream.java:306) ~[hbase-server-2.6.0.jar:2.6.0]
      at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:388) ~[hbase-server-2.6.0.jar:2.6.0]
      at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:130) ~[hbase-server-2.6.0.jar:2.6.0]
      at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:153) ~[hbase-server-2.6.0.jar:2.6.0]
      2024-07-22T17:48:13,315 WARN [RS-EventLoopGroup-1-65] ipc.NettyRpcConnection: Exception encountered while connecting to the server tx1-int-hbase-main-prod-3:16020
      org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: connection timed out after 10000 ms: tx1-int-hbase-main-prod-3/127.0.0.1:16020
      at org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:615) ~[hbase-shaded-netty-4.1.7.jar:?]
      at org.apache.hbase.thirdparty.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[hbase-shaded-netty-4.1.7.jar:?]
      at org.apache.hbase.thirdparty.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153) ~[hbase-shaded-netty-4.1.7.jar:?]
      at org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) ~[hbase-shaded-netty-4.1.7.jar:?]
      at org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) ~[hbase-shaded-netty-4.1.7.jar:?]
      at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) ~[hbase-shaded-netty-4.1.7.jar:?]
      at org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:416) ~[hbase-shaded-netty-4.1.7.jar:?]
      at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[hbase-shaded-netty-4.1.7.jar:?]
      at org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[hbase-shaded-netty-4.1.7.jar:?]
      at org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[hbase-shaded-netty-4.1.7.jar:?]
      at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_202] 

      hbase wal -p error

      hbase wal -p hdfs://coreHBaseProdHa/hbase/oldWALs/tx1-int-hbase-main-prod-4%2C16020%2C1720602602602.1720609818921

      error is :

      2024-07-23 12:36:27,064 INFO  [main] hdfs.LocatedBlocksRefresher (LocatedBlocksRefresher.java:<init>(98)) - Start located block refresher for DFSClient default.
      Writer Classes: ProtobufLogWriter AsyncProtobufLogWriter SecureProtobufLogWriter SecureAsyncProtobufLogWriter
      Cell Codec Class: org.apache.hadoop.hbase.regionserver.wal.WALCellCodec
      Exception in thread "main" java.io.EOFException: EOF while reading message size
              at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.parseDelimitedFrom(ProtobufUtil.java:3727)
              at org.apache.hadoop.hbase.regionserver.wal.ProtobufWALStreamReader.next(ProtobufWALStreamReader.java:56)
              at org.apache.hadoop.hbase.wal.WALStreamReader.next(WALStreamReader.java:42)
              at org.apache.hadoop.hbase.wal.WALPrettyPrinter.processFile(WALPrettyPrinter.java:297)
              at org.apache.hadoop.hbase.wal.WALPrettyPrinter.run(WALPrettyPrinter.java:516)
              at org.apache.hadoop.hbase.wal.WALPrettyPrinter.main(WALPrettyPrinter.java:429)

      Attachments

        1. image-2024-07-23-12-33-50-395.png
          338 kB
          Longping Jie
        2. rs-replciation-error.log
          32 kB
          Longping Jie
        3. tx1-int-hbase-main-prod-4%2C16020%2C1720602602602.1720609818921
          0.1 kB
          Longping Jie

        Activity

          People

            zhangduo Duo Zhang
            leojie Longping Jie
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: