Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
1.2.0
Description
Acceptance (secure) check is frequently failing, usually at S3 tests. The root cause is that datanodes are shut down due to too many "bad" volumes.
S3 Gateway log
INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Allocated 0 blocks. Requested 1 blocks
SCM log
Pipeline creation failed due to no sufficient healthy datanodes. Required 3. Found 0.
Datanode log
datanode_2 | 2021-06-19 13:26:08,010 [Periodic HDDS volume checker] INFO volume.StorageVolumeChecker: Scheduled health check for volume /data/hdds/hdds datanode_2 | 2021-06-19 13:36:08,013 [Periodic HDDS volume checker] WARN volume.StorageVolumeChecker: checkAllVolumes timed out after 600000 ms datanode_2 | 2021-06-19 13:36:08,014 [Periodic HDDS volume checker] WARN volume.MutableVolumeSet: checkAllVolumes got 1 failed volumes - [/data/hdds/hdds] datanode_2 | 2021-06-19 13:36:08,016 [Periodic HDDS volume checker] INFO volume.MutableVolumeSet: Moving Volume : /data/hdds/hdds to failed Volumes datanode_2 | 2021-06-19 13:36:08,016 [Periodic HDDS volume checker] ERROR statemachine.DatanodeStateMachine: DatanodeStateMachine Shutdown due to too many bad volumes, check hdds.datanode.failed.data.volumes.tolerated and hdds.datanode.failed.metadata.volumes.tolerated
Attachments
Issue Links
- is caused by
-
HDDS-5268 Ensure disk checker also scans the ratis log disks periodically
- Resolved
- links to