Details
-
Task
-
Status: Closed
-
Blocker
-
Resolution: Pending Closed
-
None
-
None
Description
Users should be able to trigger index creation using CREATE INDEX statement or a CLI tool by capturing below options for one or more partitions.
CREATE [BLOOM | COL_STATS | SOME_INDEX_TYPE] INDEX ON TABLE [table_name] FOR COLUMNS (col1, col2, col3) WITH OPTION (<file_group_count>, <some_other_option>);
Maps to following hudi configs:
METADATA_PREFIX + ".index.bloom.filter.file.group.count” METADATA_PREFIX + ".index.column.stats.file.group.count" METADATA_PREFIX + ".index.bloom.filter.for.columns” -> comma-separated column names METADATA_PREFIX + ".index.column.stats.for.columns" -> comma-separated column names
Even the CLI indexer tool will map user inputs to the above configs.
By default, bloom filter will only be for record key and column stats will be for all columns.
For v0.11.0, our assumption is:
- Static file group count for all columns.
- Infer the set of columns that have already been indexed from the MT partition layout (see
HUDI-3258).