Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-607

More efficient cube building

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • v1.4.0
    • Job Engine
    • None
    • Sprint 42, Sprint 43, Sprint 44, Sprint 45, Sprint 46

    Description

      Right now cube building is by layer of spanning trees. The algorithm results a total shuffle size around [Avg Cardinality] * [Total Cube Size]. This is the current biggest bottleneck of cube building in eBay deployment.

      Propose a different algorithm:
      1. Each mapper builds a cube segment independent, and output.
      2. One round of shuffle merge sorts the segments.
      3. Reducer outputs the final merged cube.

      This could achieve 1 * [Total Cube Size] shuffling when there's a mandatory dimension and each mapper takes a different piece on the dimension. E.g. month is mandatory and each mapper is assign a different month data.

      This algorithm is also more friendly to streaming.

      Attachments

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              liyang.gmt8@gmail.com liyang
              liyang.gmt8@gmail.com liyang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: