Details
-
New Feature
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
Description
For the use case that I only need to pull the incremental part of certain partitions, I need to do the incremental pulling from the entire dataset first then filtering in Spark.
If we can use the folder partitions directly as part of the input path, it could run faster by only load relevant parquet files.
Example:
spark.read.format("org.apache.hudi") .option(DataSourceReadOptions.VIEW_TYPE_OPT_KEY,DataSourceReadOptions.VIEW_TYPE_INCREMENTAL_OPT_VAL) .option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY, "000") .option(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY, "/year=2016/*/*/*") .load(path)
Attachments
Issue Links
- links to