Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
HDFS has problems on store small files, just like this blog said (http://blog.cloudera.com/blog/2009/02/the-small-files-problem).
This blog also tell us some way how to store small file in HDFS, but they are not good way, seems HAR files and Sequence Files are better for read-only files.
Current each HDFS block is only for one HDFS file, if too many small file there, many small blocks will be in DataNode, which will make DataNode heavy loading.
This jira will show how to online merge small blocks to big one, and how to delete small file, and so on.
Cerrentlly we have many open jira for improving HDFS scalability on NameNode, such as HDFS-7836, HDFS-8286 and so on.
So small file meta (INode and BlocksMap) will also be in NameNode.
Design document will be uploaded soon.
Attachments
Attachments
Issue Links
- depends upon
-
HDFS-5389 A Namenode that keeps only a part of the namespace in memory
- Open