Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-5170

Set max map/reduce tasks on a per-job basis, either per-node or cluster-wide

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • None
    • 0.21.0
    • None
    • None
    • Reviewed
    • Job tracker parameters permit setting limits on the number of maps (or reduces) per job and/or per node.

    Description

      There are a number of use cases for being able to do this. The focus of this jira should be on finding what would be the simplest to implement that would satisfy the most use cases.

      This could be implemented as either a per-node maximum or a cluster-wide maximum. It seems that for most uses, the former is preferable however either would fulfill the requirements of this jira.

      Some of the reasons for allowing this feature (mine and from others on list):

      • I have some very large CPU-bound jobs. I am forced to keep the max map/node limit at 2 or 3 (on a 4 core node) so that I do not starve the Datanode and Regionserver. I have other jobs that are network latency bound and would like to be able to run high numbers of them concurrently on each node. Though I can thread some jobs, there are some use cases that are difficult to thread (scanning from hbase) and there's significant complexity added to the job rather than letting hadoop handle the concurrency.
      • Poor assignment of tasks to nodes creates some situations where you have multiple reducers on a single node but other nodes that received none. A limit of 1 reducer per node for that job would prevent that from happening. (only works with per-node limit)
      • Poor mans MR job virtualization. Since we can limit a jobs resources, this gives much more control in allocating and dividing up resources of a large cluster. (makes most sense w/ cluster-wide limit)

      Attachments

        1. h5170.patch
          16 kB
          Owen O'Malley
        2. tasklimits-v4-20.patch
          15 kB
          rahul k singh
        3. tasklimits-v4.patch
          15 kB
          Matei Alexandru Zaharia
        4. HADOOP-5170-tasklimits-v3-0.18.3.patch
          22 kB
          Todd Lipcon
        5. tasklimits-v3-0.19.patch
          6 kB
          Jonathan Gray
        6. tasklimits-v3.patch
          16 kB
          Matei Alexandru Zaharia
        7. tasklimits-v2.patch
          6 kB
          Matei Alexandru Zaharia
        8. tasklimits.patch
          3 kB
          Matei Alexandru Zaharia

        Issue Links

          Activity

            People

              matei Matei Alexandru Zaharia
              streamy Jonathan Gray
              Votes:
              9 Vote for this issue
              Watchers:
              28 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: