Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-9778

Add hint to ExplicitColumnTracker to avoid seeking

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.96.2, 0.98.1, 0.99.0, 0.94.18
    • None
    • None
    • Hide
      Introduces a new scan attribute to allow a scan operation with explicit columns (Scan.addColumn) to opportunistically look ahead a few KeyValues (columns/versions) before scheduling a seek operation to seek between columns.

      A seek is efficient when it can seek past 5-10 KeyValue (columns) or 512-1024 bytes. With small rows and few versions look ahead is typically more efficient.

      API:
      {code}
          Scan s = new Scan(...);
          s.addColumn(...);
          // instructs the RegionServer to attempt two iterations of next before scheduling a seek
          s.setAttribute(Scan.HINT_LOOKAHEAD, Bytes.toBytes(2));
          table.getScanner(s);
      {code}
      Show
      Introduces a new scan attribute to allow a scan operation with explicit columns (Scan.addColumn) to opportunistically look ahead a few KeyValues (columns/versions) before scheduling a seek operation to seek between columns. A seek is efficient when it can seek past 5-10 KeyValue (columns) or 512-1024 bytes. With small rows and few versions look ahead is typically more efficient. API: {code}     Scan s = new Scan(...);     s.addColumn(...);     // instructs the RegionServer to attempt two iterations of next before scheduling a seek     s.setAttribute(Scan.HINT_LOOKAHEAD, Bytes.toBytes(2));     table.getScanner(s); {code}

    Description

      The issue of slow seeking in ExplicitColumnTracker was brought up by vrodionov on the dev list.

      My idea here is to avoid the seeking if we know that there aren't many versions to skip.
      How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value < 10) we'll avoid the seek and call SKIP repeatedly.

      HBASE-9769 has some initial number for this approach:
      Interestingly it depends on which column(s) is (are) selected.

      Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds.

      Without patch:

      Wildcard Col 1 Col 2 Col 4 Col 5 Col 2+4
      6.4 8.5 14.3 14.6 11.1 20.3

      With patch:

      Wildcard Col 1 Col 2 Col 4 Col 5 Col 2+4
      6.4 8.4 8.9 9.9 6.4 10.0

      Variation here was +- 0.2s.

      So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly.

      Attachments

        1. 9778-trunk-v9.txt
          14 kB
          Lars Hofhansl
        2. 9778-trunk-v8.txt
          14 kB
          Lars Hofhansl
        3. 9778-trunk-v7.txt
          14 kB
          Lars Hofhansl
        4. 9778-trunk-v6.txt
          12 kB
          Lars Hofhansl
        5. 9778-trunk-v3.txt
          11 kB
          Lars Hofhansl
        6. 9778-trunk-v2.txt
          6 kB
          Lars Hofhansl
        7. 9778-trunk.txt
          0.9 kB
          Lars Hofhansl
        8. 9778-0.94-v9.txt
          14 kB
          Lars Hofhansl
        9. 9778-0.94-v8.txt
          14 kB
          Lars Hofhansl
        10. 9778-0.94-v7.txt
          13 kB
          Lars Hofhansl
        11. 9778-0.94-v6.txt
          12 kB
          Lars Hofhansl
        12. 9778-0.94-v5.txt
          6 kB
          Lars Hofhansl
        13. 9778-0.94-v4.txt
          9 kB
          Lars Hofhansl
        14. 9778-0.94-v3.txt
          11 kB
          Lars Hofhansl
        15. 9778-0.94-v2.txt
          6 kB
          Lars Hofhansl
        16. 9778-0.94.txt
          0.9 kB
          Lars Hofhansl

        Issue Links

          Activity

            People

              larsh Lars Hofhansl
              larsh Lars Hofhansl
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: