[HIVE-74] Hive can use CombineFileInputFormat for when the input are many small files - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.5.0
Component/s: Query Processor
Labels:
None

Hadoop Flags:

Reviewed
Release Note:

Hide
~~HIVE-74~~. Hive can use CombineFileInputFormat for when the input has many
small files (Namit Jain via rmurthy)

Show
HIVE-74 . Hive can use CombineFileInputFormat for when the input has many small files (Namit Jain via rmurthy)

Description

There are cases when the input to a Hive job are thousands of small files. In this case, there is a mapper for each file. Most of the overhead for spawning all these mappers can be avoided if Hive used CombineFileInputFormat introduced via ~~HADOOP-4565~~

Options to control this behavior:

hive.input.format (org.apache.hadoop.hive.ql.io.CombineHiveInputFormat (default, if empty), or org.apache.hadoop.hive.ql.io.HiveInputFormat)
mapred.min.split.size.per.node (the minimum bytes of data to create a node-local partition, otherwise the data will combine to rack level. default:0)
mapred.min.split.size.per.rack (the minimum bytes of data to create a rack-local partition, otherwise the data will combine to global level. default:0)
mapred.max.split.size (the max size of each split, will be exceeded because we stop accumulating *after* reaching it, instead of before)

The 3 numbers above must be in non-descending order.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

hiveCombineSplit.patch
20/Nov/08 09:45
14 kB
Dhruba Borthakur
hiveCombineSplit.patch
17/Feb/09 23:57
15 kB
Dhruba Borthakur
hiveCombineSplit2.patch
21/Apr/09 03:42
15 kB
Dhruba Borthakur
hive.74.1.patch
10/Sep/09 18:12
36 kB
Namit Jain
hive.74.2.patch
11/Sep/09 21:17
37 kB
Namit Jain

Issue Links

blocks

HIVE-826 cleanup HiveInputFormat.getRecordReader()

Open

is blocked by

HADOOP-4565 MultiFileInputSplit can use data locality information to create splits

Closed

is related to

HIVE-826 cleanup HiveInputFormat.getRecordReader()

Open

HIVE-824 use same mapper for mltiple directories

Open

relates to

HIVE-824 use same mapper for mltiple directories

Open

Activity

People

Assignee:: Rajesh Balamohan

Reporter:: Dhruba Borthakur

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 20/Nov/08 09:40

Updated:: 26/Jan/16 00:03

Resolved:: 15/Sep/09 00:07