Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
as spark datasource read hudi table
1、snapshot mode
val readHudi = spark.read.format("org.apache.hudi").load(basePath + "/*"); should add "/*" ,otherwise will fail, because in org.apache.hudi.DefaultSource. createRelation() will use fs.globStatus(). if do not have "/*" will not get .hoodie and default dir val globPaths = HoodieSparkUtils.checkAndGlobPathIfNecessary(allPaths, fs)
2、increment mode
both basePath and basePath + "/*" is ok.This is because in org.apache.hudi.DefaultSource
DataSourceUtils.getTablePath can support both the two format.
val incViewDF = spark.read.format("org.apache.hudi").
option(QUERY_TYPE_OPT_KEY, QUERY_TYPE_INCREMENTAL_OPT_VAL).
option(BEGIN_INSTANTTIME_OPT_KEY, beginTime).
option(END_INSTANTTIME_OPT_KEY, endTime).
load(basePath)
val incViewDF = spark.read.format("org.apache.hudi"). option(QUERY_TYPE_OPT_KEY, QUERY_TYPE_INCREMENTAL_OPT_VAL). option(BEGIN_INSTANTTIME_OPT_KEY, beginTime). option(END_INSTANTTIME_OPT_KEY, endTime). load(basePath + "/*")
as increment mode and snapshot mode not coincide, user will confuse .Also load use basepath +"/" *or "/*/"* is confuse. I know this is to support partition.
but i think this api will more clear for user
partition = "year = '2019'" spark.read .format("hudi") .load(path) .where(partition)
```
Attachments
Issue Links
- is related to
-
HUDI-2493 Verify removing glob pattern works w/ all key generators
- Closed