Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3290

BucketizedHiveInputFormat should support combining files having same bucket number

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 0.10.0
    • None
    • Query Processor
    • None

    Description

      Current BucketizedHiveInputFormat creates one split per one input file, which could result too many map tasks. If input files are not so big (make configurable threshold?), combining files with same bucket number and same input format could help reducing total execution time.

      Attachments

        Issue Links

          Activity

            People

              navis Navis Ryu
              navis Navis Ryu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: