Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-1014

Use Hive Streaming API bucket info to bucket correctly

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Implemented
    • None
    • None
    • storm-hive
    • None

    Description

      The Storm bolt get a random bucket and writes data to it. Hive has expectation that rows (tuples for storm) are distributed across buckets using Hive's hash distribution. Writing to a random bucket by Storm leads to Hive optimizations that rely on bucketing to return incorrect results.

      The solution is for Storm Hive Bolt to use Hive bucket distribution information and put the rows/tuples in the correct buckets. This relies on Hive-11672.

      This might require a shuffle within Storm.

      Attachments

        Issue Links

          Activity

            People

              sriharsha Harsha
              rbains Raj Bains
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: