Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
HDFS-2288 there are two definition of visible length, or rather we're using the same name for two things:
1. The HDFS-265 design doc which defines it as property of the replica:
visible length is the "number of bytes that have been acknowledged by the downstream DataNodes". It is replica (not block) specific, meaning it can be different for different replicas at a given time. In the document it is called BA (bytes acknowledged), compared to BR (bytes received).
2. The definition in HDFS-814 and DFSClient#getVisibleLength which defines it as a property of a file:
The visible length is the length that all datanodes in the pipeline contain at least such amount of data. Therefore, these data are visible to the readers.
According to this definition the visible length of a file is the floor of all visible lengths of all the replicas of the last block. It's a static property set on open, eg is not updated when a writer calls hflush. Also DFSInputStream#readBlockLength returns the 1st visible length of a replica it finds, so it seems possible (though unlikely) in a failure scenario it could return a length that was longer than what all replicas had.
This has caused confusion in a number of other jiras. We should update the design doc, java doc, perhaps rename DFSClient#getVisibleLength etc to disambiguate this.