Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-7958

Statistics per-column family per-region

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 0.95.2
    • None
    • None
    • None

    Description

      Originating from this discussion on the dev list: http://search-hadoop.com/m/coDKU1urovS/Simple+stastics+per+region/v=plain

      Essentially, we should have built-in statistics gathering for HBase tables. This allows clients to have a better understanding of the distribution of keys within a table and a given region. We could also surface this information via the UI.

      There are a couple different proposals from the email, the overview is this:
      We add in something on compactions that gathers stats about the keys that are written and then we surface them to a table.

      The possible proposals include:

      How to implement it?

      1. Coprocessors -
        • advantage - it easily plugs in and people could pretty easily add their own statistics.
        • disadvantage - UI elements would also require this, we get into dependent loading, which leads down the OSGi path. Also, these CPs need to be installed after all the other CPs on compaction to ensure they see exactly what gets written (doable, but a pain)
      2. Built into HBase as a custom scanner
        • advantage - always goes in the right place and no need to muck about with loading CPs etc.
        • disadvantage - less pluggable, at least for the initial cut

      Where do we store data?

      1. .META.
        • advantage - its an existing table, so we can jam it into another CF there
        • disadvantage - this would make META much larger, possibly leading to splits AND will make it much harder for other processes to read the info
      2. A new stats table
        • advantage - cleanly separates out the information from META
        • disadvantage - should use a 'system table' idea to prevent accidental deletion, manipulation by arbitrary clients, but still allow clients to read it.

      Once we have this framework, we can then move to an actual implementation of various statistics.

      Attachments

        1. hbase-7958-v0-parent.patch
          40 kB
          Jesse Yates
        2. hbase-7958-v0.patch
          133 kB
          Jesse Yates
        3. hbase-7958_rough-cut-v0.patch
          52 kB
          Jesse Yates

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jesse_yates Jesse Yates
              Votes:
              0 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: