Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Ability to read/write data in Spark from/to HDFS of a remote Hadoop Cluster
In today's world of Analytics many use cases need capability to access data from multiple remote data sources in Spark. Though Spark has great integration with local Hadoop cluster it lacks heavily on capability for connecting to a remote Hadoop cluster. However, in reality not all data of enterprises in Hadoop and running Spark Cluster locally with Hadoop Cluster is not always a solution.
In this improvement we propose to create a connector for accessing data (read and write) from/to HDFS of a remote Hadoop cluster from Spark using webhdfs api.
Attachments
1.
|
WebHDFS: Initial Code Delivery | Open | Unassigned |
|
|||||||
2.
|
Unit Testing | Closed | Unassigned | ||||||||
3.
|
WebHDFS: Integration Test | Open | Unassigned | ||||||||
4.
|
WebHDFS: Code clean-up | Open | Unassigned | ||||||||
5.
|
WebHDFS: Add support for JSON file format | Open | Unassigned | ||||||||
6.
|
WebHDFS: Add support for XML file format | Open | Unassigned | ||||||||
7.
|
WebHDFS: Add examples | Open | Unassigned |