[PARQUET-2256] Adding Compression for BloomFilter - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: format-2.9.0
Fix Version/s: None
Component/s: parquet-format
Labels:
None

Description

In Current Parquet implementions, if BloomFilter doesn't set the ndv, most implementions will guess the 1M as the ndv. And use it for fpp. So, if fpp is 0.01, the BloomFilter size may grows to 2M for each column, which is really huge. Should we support compression for BloomFilter, like:

```

/**

The compression used in the Bloom filter.
**/
struct Uncompressed {}
union BloomFilterCompression { 1: Uncompressed UNCOMPRESSED; +2: CompressionCodec COMPRESSION; }

```

Attachments

Activity

People

Assignee:: Xuwei Fu

Reporter:: Xuwei Fu

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 13/Mar/23 14:30

Updated:: 23/Jun/24 03:32