Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-2000

Coprocessors

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.92.0
    • Coprocessors
    • None
    • Reviewed

    Description

      From Google's Jeff Dean, in a keynote to LADIS 2009 (http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009, slides 66 - 67):

      BigTable Coprocessors (New Since OSDI'06)

      • Arbitrary code that runs run next to each tablet in table
        • As tablets split and move, coprocessor code automatically splits/moves too
      • High-level call interface for clients
        • Unlike RPC, calls addressed to rows or ranges of rows
      • coprocessor client library resolves to actual locations
        • Calls across multiple rows automatically split into multiple parallelized RPCs
      • Very flexible model for building distributed services
        • Automatic scaling, load balancing, request routing for apps

      Example Coprocessor Uses

      • Scalable metadata management for Colossus (next gen GFS-like file system)
      • Distributed language model serving for machine translation system
      • Distributed query processing for full-text indexing support
      • Regular expression search support for code repository

      For HBase, adding a coprocessor framework will allow for pluggable incremental addition of functionality. No more need to subclass the regionserver interface and implementation classes and set hbase.regionserver.class and hbase.regionserver.impl in hbase-site.xml. That mechanism allows for extension but at the exclusion of all others.

      Also in HBASE-2001 currently there is a in-process map reduce framework for the regionservers. Coprocessors can optionally implement a 'MapReduce' interface which clients will be able to invoke concurrently on all regions of the table. Note this is not MapReduce on the table; this is MapReduce on each region, concurrently. One can implement MapReduce in a manner very similar to Hadoop's MR framework, or use shared variables to avoid the overhead of generating (and processing) a lot of intermediates. An initial application of this could be support for rapid calculation of aggregates over data stored in HBase.

      Attachments

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              Unassigned Unassigned
              apurtell Andrew Kyle Purtell
              Votes:
              6 Vote for this issue
              Watchers:
              39 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: