[HIVE-19480] Implement and Incorporate MAPREDUCE-207 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.2.3
Fix Version/s: None
Component/s: HiveServer2
Labels:
None

Description

HiveServer2 has the ability to run many MapReduce jobs in parallel.
Each MapReduce application calculates the job's file splits at the client level
= HiveServer2 loading many file splits at the same time, putting pressure on memory

"The client running the job calculates the splits for the job by calling getSplits(), then sends them to the application master, which uses their storage locations to schedule map tasks that will process them on the cluster."

"Hadoop: The Definitive Guide"

MAPREDUCE-207 should address this memory pressure by moving split calculations into ApplicationMaster. Spark and Tez already take this approach.

Once MAPREDUCE-207 is completed, leverage the capability in HiveServer2.

Attachments

Issue Links

depends upon

MAPREDUCE-207 Computing Input Splits on the MR Cluster

Open

Activity

People

Assignee:: Unassigned

Reporter:: David Mollitor

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 09/May/18 20:55

Updated:: 09/May/18 21:28