Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.4.1
-
Reviewed
Description
The feature added HDFS-14617(in Improve FSImage load time by writing sub-sections to the FSImage index. by Stephen O'Donnell) makes loading FSImage very faster.
But this option cannot be activated when turn on dfs.image.compress=true.
In my opinion, larger clusters require both settings at the same time.
For Example, the cluster I'm using has approximately 6 million file system objects and FSImage is approximately 11GB with dfs.image.compress=true setting.
If turn off the dfs.image.compress option, it is expected to exceed 30GB, in which case it will take a long time to move FSImage from standby to active namenode using high network resource.
It was proved in this jira(HDFS-16147 by kinit) that loading FSImage parallel and FSImage compression can be turned on at the same time. (And worked well on my environment also.)
I created this new Jira and PR because the discussion in HDFS-16147 ended in 2021, and I want it to be officially added in the next release, instead of patch available.
The actual code of the patch was written by kinit and I resolved empty sub-section problem(see below comment of HDFS-16147) and added test code.
If this is not a proper method, please let me know another way to contribute.
Thanks.
Attachments
Attachments
Issue Links
- is a child of
-
HDFS-16147 load fsimage with parallelization and compression
- Patch Available
- is related to
-
HDFS-14617 Improve fsimage load time by writing sub-sections to the fsimage index
- Resolved
- links to