Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-24528

Improve balancer decision observability

    XMLWordPrintableJSON

Details

    • Reviewed
    • Hide
      Retrieve latest balancer decisions made by LoadBalancers.

      Examples:
        hbase> get_balancer_decisions
      Retrieve recent balancer decisions with region plans

        hbase> get_balancer_decisions LIMIT => 10
      Retrieve 10 most recent balancer decisions with region plans


      Config change:

      hbase.master.balancer.decision.buffer.enabled:

            Indicates whether active HMaster has ring buffer running for storing
            balancer decisions in FIFO manner with limited entries. The size of
            the ring buffer is indicated by config:
            hbase.master.balancer.decision.queue.size

      Show
      Retrieve latest balancer decisions made by LoadBalancers. Examples:   hbase> get_balancer_decisions Retrieve recent balancer decisions with region plans   hbase> get_balancer_decisions LIMIT => 10 Retrieve 10 most recent balancer decisions with region plans Config change: hbase.master.balancer.decision.buffer.enabled:       Indicates whether active HMaster has ring buffer running for storing       balancer decisions in FIFO manner with limited entries. The size of       the ring buffer is indicated by config:       hbase.master.balancer.decision.queue.size

    Description

      We provide detailed INFO and DEBUG level logging of balancer decision factors, outcome, and reassignment planning, as well as similarly detailed logging of the resulting assignment manager activity. However, an operator may need to perform online and interactive observation, debugging, or performance analysis of current balancer activity. Scraping and correlating the many log lines resulting from a balancer execution is labor intensive and has a lot of latency (order of ~minutes to acquire and index, order of ~minutes to correlate).

      The balancer should maintain a rolling window of history, e.g. the last 100 region move plans, or last 1000 region move plans submitted to the assignment manager. This history should include decision factor details and weights and costs. The rsgroups balancer may be able to provide fairly simple decision factors, like for example "this table was reassigned to that regionserver group". The underlying or vanilla stochastic balancer on the other hand, after a walk over random assignment plans, will have considered a number of cost functions with various inputs (locality, load, etc.) and multipliers, including custom cost functions. We can devise an extensible class structure that represents explanations for balancer decisions, and for each region move plan that is actually submitted to the assignment manager, we can keep the explanations of all relevant decision factors alongside the other details of the assignment plan like the region name, and the source and destination regionservers.

      This history should be available via API for use by new shell commands and admin UI widgets.

      The new shell commands and UI widgets can unpack the representation of balancer decision components into human readable output.

      Attachments

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              vjasani Viraj Jasani
              apurtell Andrew Kyle Purtell
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: