Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
It would be nice to be able to read/write from HDFS, particularly for bootstrapping purposes. A few points:
- Per the discussion about leveldb this support should be separated into its own package and project (jar) for easy testing and severability.
- Similar to the Kafka RegexTopicGenerator, we can enumerate (recursively or not) the files in an HDFS directory during job startup.
- Connectivity with HCatalog would be interesting as well, but should be handled in a separate JIRA.