Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
2.1.0
-
None
-
None
-
None
Description
We have the exactly the same issue with https://issues.apache.org/jira/browse/HBASE-13651:
- The SCAN will got FNFE after RS got Full GC and transmitted and opened in another RS.
- During which ,taking snapshot will also report FNFE
- Issue could be resolved by move the problem region manually.
We find that the HBASE-13651 is reverted afterwards by https://issues.apache.org/jira/browse/HBASE-18786 since they thought it is not a problem anymore with the comment in HBASE-18786
Basic Timeline of my issue:
2022-08-27 05:26:35 Snapshot TestSnapshot is taken successfully
2022-08-27 15:21:51 The target hfile fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16 is generated by a compaction in regionserver-67
2022-08-27 17:26:36 041e9aeb8cdb46f991459c92f8581e16 is compacted to fd53b8e6b4874eb38712ad2d04389fff successfully
2022-08-27 17:34:53 A full GC started to happen on regionserver-67
2022-08-27 17:35:50 Region fafb8f91bd20b1adfe15e2a64a39557e is re-opened in regionserver-11, which is scheduled by HMaster
2022-08-27 17:35:56 regionserver-67 wake up from Full GC
2022-08-27 17:35:57 File fafb8f91bd20b1adfe15e2a64a39557e is archived by lashadoop-regionserver-67 and afterwards, regionserver-67 found that it is kicked out and exit.
2022-08-27 18:00:00 The archived hfile is removed by HMaster's CleanerChore
2022-08-27 19:48:10 User's job shows error that the file is missed
2022-08-27 20:26:04 Re-taking snapshot TaggingSegmentationSnapshot failed for 041e9aeb8cdb46f991459c92f8581e16 is missing
The exception of Scanning after region is transmitted:
java.io.FileNotFoundException: File does not exist:/hbase/prod/hbase-prod/data/default/mdm/fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:85)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)
at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:152)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:735)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:415)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:861)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:848)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:837)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1005)
at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:317)
at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:313)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:325)
at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:163)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:898)
at org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.<init>(FSDataInputStreamWrapper.java:125)
at org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.<init>(FSDataInputStreamWrapper.java:102)
at org.apache.hadoop.hbase.regionserver.StoreFileInfo.open(StoreFileInfo.java:269)
at org.apache.hadoop.hbase.regionserver.HStoreFile.createStreamReader(HStoreFile.java:491)
at org.apache.hadoop.hbase.regionserver.HStoreFile.getStreamScanner(HStoreFile.java:516)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.getScannersForStoreFiles(StoreFileScanner.java:149)
at org.apache.hadoop.hbase.regionserver.HStore.getScanners(HStore.java:1309)
at org.apache.hadoop.hbase.regionserver.HStore.recreateScanners(HStore.java:2042)
at org.apache.hadoop.hbase.regionserver.StoreScanner.trySwitchToStreamRead(StoreScanner.java:1064)
at org.apache.hadoop.hbase.regionserver.StoreScanner.shipped(StoreScanner.java:1198)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.shipped(KeyValueHeap.java:437)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.shipped(HRegion.java:6959)
at org.apache.hadoop.hbase.regionserver.RSRpcServices$RegionScannerShippedCallBack.run(RSRpcServices.java:388)
at org.apache.hadoop.hbase.ipc.ServerCall.setResponse(ServerCall.java:289)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:161)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
The exception of taking snapshot after region is transmitted:
2022-08-27 20:26:03,794 ERROR org.apache.hadoop.hbase.procedure.Subprocedure: Subprocedure 'TaggingSegmentationSnapshot' aborting due to a ForeignException! java.io.FileNotFoundException via regionserver-11.**,60020,1653373878295:java.io.FileNotFoundException: File does not exist: hdfs://test-hbase/hbase/prod/hbase-prod/data/default/mdm/fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16 at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:349) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:173) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:193) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:189) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:53) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.FileNotFoundException: File does not exist: hdfs://beaconstore/hbase/prod/hbase-prod/data/ap/mdm_user_segments/fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16 at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1500) at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1493) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1508) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442) at org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:368) at org.apache.hadoop.hbase.snapshot.SnapshotManifestV2$ManifestBuilder.storeFile(SnapshotManifestV2.java:129) at org.apache.hadoop.hbase.snapshot.SnapshotManifestV2$ManifestBuilder.storeFile(SnapshotManifestV2.java:68) at org.apache.hadoop.hbase.snapshot.SnapshotManifest.addRegion(SnapshotManifest.java:249) at org.apache.hadoop.hbase.snapshot.SnapshotManifest.addRegion(SnapshotManifest.java:218) at org.apache.hadoop.hbase.regionserver.HRegion.addRegionToSnapshot(HRegion.java:4285) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:134) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:77) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ... 4 more
Attachments
Attachments
Issue Links
- is related to
-
HBASE-8478 HBASE-2231 breaks TestHRegion#testRecoveredEditsReplayCompaction under hadoop2 profile
- Closed
-
HBASE-18786 FileNotFoundException should not be silently handled for primary region replicas
- Closed
-
HBASE-13651 Handle StoreFileScanner FileNotFoundException
- Closed
-
HBASE-2231 Compaction events should be written to HLog
- Closed