Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14919

Improve the performance of Hive on Spark 2.0.0

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      In HIVE-14029, we have updated Spark dependency to 2.0.0. We use Intel BigBench[1] to run benchmark with Spark 2.0 over 1 TB data set comparing with Spark 1.6. We can see performance improvments about 5.4% in general and 45% for the best case. However, some queries doesn't have significant performance improvements. This JIRA is the umbrella ticket addressing those performance issues.

      [1] https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench

      Attachments

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              Ferd Ferdinand Xu
              Ferd Ferdinand Xu
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated: