Description
SequenceFiles should support 'custom compressors' which can be specified by the user on creation of the file.
Readily available packages for gzip and zip (java.util.zip) are among obvious choices to support. Of course there will be hooks so that other compressors can be added in future as long as there is a way to construct (input/output) streams on top of the compressor/decompressor.
The 'classname' of the 'custom compressor/decompressor' could be stored in the header of the SequenceFile which can then be used by SequenceFile.Reader to figure out the appropriate 'decompressor'. Thus I propose we add constructors to SequenceFile.Writer which take in the 'classname' of the compressor's input/output stream classes (e.g. DeflaterOutputStream/InflaterInputStream or GZIPOutputStream/GZIPInputStream), which acts as the hook for future compressors/decompressors.
Attachments
Attachments
Issue Links
- depends upon
-
HADOOP-54 SequenceFile should compress blocks, not individual entries
- Closed
- is related to
-
HADOOP-474 support compressed text files as input and output
- Closed
- relates to
-
HADOOP-538 Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
- Closed