[HADOOP-445] Parallel data/socket writing for DFSOutputStream - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Won't Fix
Affects Version/s: 0.5.0
Fix Version/s: None
Component/s: None
Labels:
None

Description

Currently, as DFS clients output blocks they write the entire block to disk before starting to transmit to the datanode. By writing to disk the client is able to retry a block write if the datanode files in the middle of a block transfer. Writing to disk and then to the datanode adds latency. Hopefully, the common case is that block transfers to datanodes are successful. This patch writes to the datanode and the disk in parallel. If the write to the datanode fails, it falls back to current behavior.

In my tests of transmits of 237M and 946M datasets using -copyFromLocal I'm seeing a 20-25% improvement in throughput.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ASF.LICENSE.NOT.GRANTED--fastClientWrite.patch
10/Aug/06 20:54
5 kB
Benjamin Reed

Issue Links

relates to

HADOOP-1707 Remove the DFS Client disk-based cache

Closed

Activity

People

Assignee:: Sameer Paranjpye

Reporter:: Benjamin Reed

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 10/Aug/06 20:54

Updated:: 03/Jul/10 18:27

Resolved:: 22/Jul/08 00:03