Uploaded image for project: 'Bahir (Retired)'
  1. Bahir (Retired)
  2. BAHIR-67

WebHDFS Data Source for Spark SQL

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Spark SQL Data Sources
    • None

    Description

      Ability to read/write data in Spark from/to HDFS of a remote Hadoop Cluster

      In today's world of Analytics many use cases need capability to access data from multiple remote data sources in Spark. Though Spark has great integration with local Hadoop cluster it lacks heavily on capability for connecting to a remote Hadoop cluster. However, in reality not all data of enterprises in Hadoop and running Spark Cluster locally with Hadoop Cluster is not always a solution.

      In this improvement we propose to create a connector for accessing data (read and write) from/to HDFS of a remote Hadoop cluster from Spark using webhdfs api.

      Attachments

        1.
        WebHDFS: Initial Code Delivery Sub-task Open Unassigned

        0%

        Original Estimate - 504h
        Remaining Estimate - 504h
        2.
        Unit Testing Sub-task Closed Unassigned  
        3.
        WebHDFS: Integration Test Sub-task Open Unassigned  
        4.
        WebHDFS: Code clean-up Sub-task Open Unassigned  
        5.
        WebHDFS: Add support for JSON file format Sub-task Open Unassigned  
        6.
        WebHDFS: Add support for XML file format Sub-task Open Unassigned  
        7.
        WebHDFS: Add examples Sub-task Open Unassigned  

        Activity

          People

            smazumder Sourav Mazumder
            smazumder Sourav Mazumder
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 840h
                840h
                Remaining:
                Remaining Estimate - 840h
                840h
                Logged:
                Time Spent - Not Specified
                Not Specified