[HDDS-4666] Handling disk issues in Datanodes - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: None
Component/s: Ozone Datanode, SCM
Labels:
None

Description

Currently, there is no notion of reserved space on datanodes as it exists on hdfs datanodes. Similarly, a datanode low on disk capacity continues to participate in pipeline allocation activity and keep on receiving write requests and these requests fail and potentially will end up running into retry loop in the client.

Similarly, ratis log disks are currently not accounted for by disk checker. Once a ratis disk gets full, existing pipelines can not be closed as closing of pipeline involves taking a ratis snapshot which will not succeed if the ratis disk is full. Similarly, new pipelines cannot be functional on such disks and will end up failing write requests and contribute in client retry chain.

Similarly, nodes low on disk capacity should not be choosen as targets for container re-replication.

The goal of the Jira is address disk related issues on datanodes holistically.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Hdds datanode disk(volume) handling issues.pdf
27/May/21 09:51
97 kB
Mark Gui

Issue Links

is a parent of

HDDS-3022 Datanode unable to close Pipeline after disk out of space

Resolved

is related to

HDDS-7365 Integrate container and volume scanners

Resolved

RATIS-1375 Handle bad storage dir due to disk failures

Resolved

relates to

RATIS-1377 Ratis min free space for storage dirs

Resolved

Sub-Tasks

1.	On-demand disk checker for hdds volume	Resolved	Mark Gui
2.	Periodic disk check interval is fixed 15min and should be configurable	Resolved	Mark Gui
3.	Limit number of bad volumes by dfs.datanode.failed.volumes.tolerated	Resolved	Mark Gui
4.	Datanode hasEnoughSpace check should apply on volume instead of global DN	Resolved	Mark Gui
5.	Support reserved space of single dir	Resolved	runzhiwang
6.	Pipeline placement policy filter datanodes that have not enough space for a single container	Resolved	Mark Gui
7.	Ensure disk checker also scans the ratis log disks periodically	Resolved	Mark Gui
8.	Datandoe with low ratis log volume space should not be considered for new pipeline allocation	Resolved	Mark Gui
9.	Fix datanode reserved space calculation	Resolved	Mark Gui
10.	Get more accurate space info for DedicatedDiskSpaceUsage	Resolved	Mark Gui
11.	Fix skipped volume check due to disk.check.min.gap	Resolved	Mark Gui
12.	Mark Datanode with no heathy data or metadata disks as dead in SCM	Open	Shashikant Banerjee
13.	Apply container space check to Ratis factor one pipelines	Resolved	Ethan Rose

Activity

People

Assignee:: Shashikant Banerjee

Reporter:: Shashikant Banerjee

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 11/Jan/21 08:48

Updated:: 21/Jan/24 15:51

Resolved:: 04/Aug/23 08:19