Uploaded image for project: 'Slider'
  1. Slider
  2. SLIDER-313

Slider AM fails to restart on kill, post newly created application

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Slider 0.40
    • Slider 0.50
    • agent-provider
    • None

    Description

      Slider AM fails to restart on kill on a newly created application. Containers are assigned with high priority ids 1073741825 and 1073741826 on AM restart.

      Steps to reproduce:

      1. Create a new package
      slider create cl1 ...

      2. Kill AM

      3. Wait for RM to create new AM

      4. Check AM logs - will see NullPointerException like below

      Exception: java.lang.NullPointerException
      14/08/13 00:42:59 ERROR main.ServiceLauncher: Exception: java.lang.NullPointerException
      java.lang.NullPointerException
      	at org.apache.slider.providers.agent.AgentProviderService.rebuildContainerDetails(AgentProviderService.java:344)
      	at org.apache.slider.server.appmaster.SliderAppMaster.createAndRunCluster(SliderAppMaster.java:722)
      	at org.apache.slider.server.appmaster.SliderAppMaster.runService(SliderAppMaster.java:454)
      	at org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:186)
      	at org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:471)
      	at org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:401)
      	at org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.java:626)
      	at org.apache.slider.server.appmaster.SliderAppMaster.main(SliderAppMaster.java:1735)
      14/08/13 00:42:59 INFO util.ExitUtil: Exiting with status 32
      

      5. Go to RM logs and see the container assigned logs as below -

      2014-08-13 02:04:36,991 INFO  capacity.LeafQueue (LeafQueue.java:assignContainer(1352)) - 
      assignedContainer application attempt=appattempt_1407891977820_0005_000001 
      container=Container: [ContainerId: container_1407891977820_0005_01_000002, 
      NodeId: c6409.ambari.apache.org:45454, NodeHttpAddress: c6409.ambari.apache.org:8042, 
      Resource: <memory:256, vCores:1>, Priority: 1073741825, Token: null, ] queue=default: capacity=1.0, 
      absoluteCapacity=1.0, usedResources=<memory:768, vCores:3>, usedCapacity=0.375, 
      absoluteUsedCapacity=0.375, numApps=1, numContainers=3 clusterResource=<memory:2048, vCores:8>
      

      Check that Priority is assigned the value: 1073741825

      For HBase application, Slider AM expects priority to be either 1 or 2 for its agent containers. The ContainerPriority class defines the following variable which seems to be used for some locality feature -

      NOLOCATION = 1 << 30
      

      On subsequent freeze and thaw and then AM kill this issue never occurs.

      Note: The NPE in AgentProviderService is fixed but not merged yet. However a fix to this issue is required for SLIDER-285 feature to work on AM kill post a newly created application.

      Attachments

        Issue Links

          Activity

            People

              gsaha Gour Saha
              gsaha Gour Saha
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: