Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
Steps :
- Create multiple snapshots across various volume/buckets as background load
- Perform random snapshot operations such as create, delete, list, diff across 4 test buckets of combination -
-
- EC - FSO and OBS
- Ratis - FSO and OBS
- In parallel also perform repetitive reconstructions and re-replications on the above buckets
Error snippet in OM1 -
2024-06-19 16:30:39,580 INFO [OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager: Created checkpoint : /var/lib/hadoop-ozone/om/data793412/db.snapshots/checkpointState/om.db-a4f6fa69-cf80-4fea-a3d9-6684faceff89 for snapshot snap851 2024-06-19 16:30:40,764 ERROR [OM StateMachine ApplyTransaction Thread - 0]-org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine: Terminating with exit status 1: Request cmdType: SnapshotPurge clientId: "client-DE4DAB584843" SnapshotPurgeRequest { updatedSnapshotDBKey: "/testvol/buckecfso/snap1718809217" } failed with exception java.lang.NullPointerException at org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotPurgeRequest.validateAndUpdateCache(OMSnapshotPurgeRequest.java:107) at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:378) at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:560) at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:353) at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)
Error snippet in OM2 -
2024-06-19 16:30:39,611 INFO [OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager: Created checkpoint : /var/lib/hadoop-ozone/om/data793412/db.snapshots/checkpointState/om.db-a4f6fa69-cf80-4fea-a3d9-6684faceff89 for snapshot snap851 2024-06-19 16:30:40,770 ERROR [OM StateMachine ApplyTransaction Thread - 0]-org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine: Terminating with exit status 1: Request cmdType: SnapshotPurge clientId: "client-DE4DAB584843" SnapshotPurgeRequest { updatedSnapshotDBKey: "/testvol/buckecfso/snap1718809217" } failed with exception java.lang.NullPointerException at org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotPurgeRequest.validateAndUpdateCache(OMSnapshotPurgeRequest.java:107) at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:378) at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:560) at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:353) at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) 2024-06-19 16:30:40,777 INFO [shutdown-hook-0]-org.apache.ranger.audit.provider.AuditProviderFactory: ==> JVMShutdownHook.run()
Error snippet in OM3 -
2024-06-19 16:30:44,852 ERROR [KeyDeletingService#0]-org.apache.hadoop.ozone.om.service.KeyDeletingService: Snapshot deep cleaning request failed. Will retry at next run. com.google.protobuf.ServiceException: org.apache.hadoop.ozone.om.exceptions.OMNotLeaderException: OM:om135 is not the leader. Could not determine the leader node. at org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.processReply(OzoneManagerRatisServer.java:462) at org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.submitRequest(OzoneManagerRatisServer.java:289) at org.apache.hadoop.ozone.om.service.KeyDeletingService$KeyDeletingTask.submitRequest(KeyDeletingService.java:392) at org.apache.hadoop.ozone.om.service.KeyDeletingService$KeyDeletingTask.updateDeepCleanedSnapshots(KeyDeletingService.java:373) at org.apache.hadoop.ozone.om.service.KeyDeletingService$KeyDeletingTask.processSnapshotDeepClean(KeyDeletingService.java:357) at org.apache.hadoop.ozone.om.service.KeyDeletingService$KeyDeletingTask.call(KeyDeletingService.java:204) at org.apache.hadoop.hdds.utils.BackgroundService$PeriodicalTask.lambda$run$0(BackgroundService.java:121) at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1736) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: org.apache.hadoop.ozone.om.exceptions.OMNotLeaderException: OM:om135 is not the leader. Could not determine the leader node. at org.apache.hadoop.ozone.om.exceptions.OMNotLeaderException.convertToOMNotLeaderException(OMNotLeaderException.java:86) at org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.processReply(OzoneManagerRatisServer.java:463) ... 13 more 2024-06-19 16:30:44,854 ERROR [KeyDeletingService#0]-org.apache.hadoop.ozone.om.service.KeyDeletingService: Snapshot deep cleaning request failed. Will retry at next run. com.google.protobuf.ServiceException: org.apache.ratis.protocol.exceptions.ServerNotReadyException: om135@group-323117034165 is not in [RUNNING]: current state is CLOSING at org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.submitRequestToRatis(OzoneManagerRatisServer.java:298) at org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.submitRequest(OzoneManagerRatisServer.java:288) at org.apache.hadoop.ozone.om.service.KeyDeletingService$KeyDeletingTask.submitRequest(KeyDeletingService.java:392) at org.apache.hadoop.ozone.om.service.KeyDeletingService$KeyDeletingTask.updateDeepCleanedSnapshots(KeyDeletingService.java:373) at org.apache.hadoop.ozone.om.service.KeyDeletingService$KeyDeletingTask.processSnapshotDeepClean(KeyDeletingService.java:357) at org.apache.hadoop.ozone.om.service.KeyDeletingService$KeyDeletingTask.call(KeyDeletingService.java:204) at org.apache.hadoop.hdds.utils.BackgroundService$PeriodicalTask.lambda$run$0(BackgroundService.java:121) at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1736) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.util.concurrent.ExecutionException: org.apache.ratis.protocol.exceptions.ServerNotReadyException: om135@group-323117034165 is not in [RUNNING]: current state is CLOSING at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) at org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.submitRequestToRatis(OzoneManagerRatisServer.java:296) ... 13 more Caused by: org.apache.ratis.protocol.exceptions.ServerNotReadyException: om135@group-323117034165 is not in [RUNNING]: current state is CLOSING at org.apache.ratis.server.impl.RaftServerImpl.lambda$assertLifeCycleState$9(RaftServerImpl.java:749) at org.apache.ratis.util.LifeCycle.assertCurrentState(LifeCycle.java:253) at org.apache.ratis.server.impl.RaftServerImpl.assertLifeCycleState(RaftServerImpl.java:748) at org.apache.ratis.server.impl.RaftServerImpl.submitClientRequestAsync(RaftServerImpl.java:838) at org.apache.ratis.server.impl.RaftServerImpl.lambda$null$12(RaftServerImpl.java:831) at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:117) at org.apache.ratis.server.impl.RaftServerImpl.lambda$executeSubmitClientRequestAsync$13(RaftServerImpl.java:831) at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) ... 3 more