Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Done
-
None
-
None
-
None
-
None
Description
Provide file size stats for the latest updates that hudi is consuming. These stats are at table level by default, but specifying -enable-partition-stats will also show stats at the partition level. If a start date (start-date parameter) and/or end date (end-date parameter) are specified, stats are based on files that were modified in the half-open interval [start date (start-date parameter), end date (-end-date parameter)). --num-days parameter can be used to select data files over last --num-days. If --start-date is specified, --num-days will be ignored. If none of the date parameters are set, stats will be computed over all data files of all partitions in the table. Note that date filtering is carried out only if the partition name has the format '[column name=]yyyy-M-d', '[column name=]yyyy/M/d'.
The following stats are produced by this class:
* Number of files.
* Total table size.
* Minimum file size
* Maximum file size
* Average file size
* Median file size
* p50 file size
* p90 file size
* p95 file size
* p99 file size
Attachments
Issue Links
- duplicates
-
HUDI-6193 Add file size stats utility
- Open