Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.4.0
-
Reviewed
Description
The purpose of this Jira is to improve oiv tool to parse fsimage file with sub-sections (see ) in parallel with delmited format. HDFS-14617
1.Serial parsing is time-consuming
The time to serially parse a large fsimage with delimited format (e.g. `hdfs oiv -p Delimited -t <tmp> ...`) is as follows:
1) Loading string table: -> Not time consuming. 2) Loading inode references: -> Not time consuming 3) Loading directories in INode section: -> Slightly time consuming (3%) 4) Loading INode directory section: -> A bit time consuming (11%) 5) Output: -> Very time consuming (86%)
Therefore, output is the most parallelized stage.
2.How to output in parallel
The sub-sections are grouped in order, and each thread processes a group and outputs it to the file corresponding to each thread, and finally merges the output files.
3. The result of a test
input fsimage file info: 3.4G, 12 sub-sections, 55976500 INodes ----------------------------------------- Threads TotalTime OutputTime MergeTime 1 18m37s 16m18s – 4 8m7s 4m49s 41s
Attachments
Attachments
Issue Links
- links to