Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-1307

spark datasource load path format is confused for snapshot and increment read mode

    XMLWordPrintableJSON

Details

    Description

      as spark datasource read hudi table

      1、snapshot mode

       val readHudi = spark.read.format("org.apache.hudi").load(basePath + "/*");
      should add "/*" ,otherwise will fail, because in org.apache.hudi.DefaultSource.
      createRelation() will use fs.globStatus(). if do not have "/*" will not get .hoodie and default dir
      val globPaths = HoodieSparkUtils.checkAndGlobPathIfNecessary(allPaths, fs)

       

      2、increment mode

      both basePath and  basePath + "/*"  is ok.This is because in org.apache.hudi.DefaultSource  

      DataSourceUtils.getTablePath can support both the two format.

       val incViewDF = spark.read.format("org.apache.hudi").
       option(QUERY_TYPE_OPT_KEY, QUERY_TYPE_INCREMENTAL_OPT_VAL).
       option(BEGIN_INSTANTTIME_OPT_KEY, beginTime).
       option(END_INSTANTTIME_OPT_KEY, endTime).
       load(basePath)

       

       val incViewDF = spark.read.format("org.apache.hudi").
       option(QUERY_TYPE_OPT_KEY, QUERY_TYPE_INCREMENTAL_OPT_VAL).
       option(BEGIN_INSTANTTIME_OPT_KEY, beginTime).
       option(END_INSTANTTIME_OPT_KEY, endTime).
       load(basePath + "/*")
       

       

      as  increment mode and snapshot mode not coincide, user will confuse .Also load use basepath +"/"  *or "/*/"* is  confuse. I know this is to support partition.

      but i think this api will more clear for user

       

       partition = "year = '2019'"
      spark.read .format("hudi") .load(path) .where(partition) 

       

       ```

      Attachments

        Issue Links

          Activity

            People

              309637554 liwei
              309637554 liwei
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: