Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-6203

Add support to standalone utility tool to fetch file size stats for a given table w/ optional partition filters

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Done
    • None
    • None
    • None
    • None

    Description

      Provide file size stats for the latest updates that hudi is consuming. These stats are at table level by default, but specifying -enable-partition-stats will also show stats at the partition level. If a start date (start-date parameter) and/or end date (end-date parameter) are specified, stats are based on files that were modified in the half-open interval [start date (start-date parameter), end date (-end-date parameter)). --num-days parameter can be used to select data files over last --num-days. If --start-date is specified, --num-days will be ignored. If none of the date parameters are set, stats will be computed over all data files of all partitions in the table. Note that date filtering is carried out only if the partition name has the format '[column name=]yyyy-M-d', '[column name=]yyyy/M/d'.
      The following stats are produced by this class:
       * Number of files.
       * Total table size.
       * Minimum file size
       * Maximum file size
       * Average file size
       * Median file size
       * p50 file size
       * p90 file size
       * p95 file size
       * p99 file size

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              amrishlal Amrish Lal
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: