Uploaded image for project: 'Airavata'
  1. Airavata
  2. AIRAVATA-2386 Fix issues with email monitoring
  3. AIRAVATA-2378

Jobs failing at execution of squeue command due to response of 'Invalid job ID'

    XMLWordPrintableJSON

Details

    Description

      When the job is submitted and a job ID is returned fro the cluster, gfac executes squeue command. When this command returns queued job details gfac goes and executes gateway user details to XSEDE machines and also adds the job ID to monitoring map.

      In intermittent cases, the SSH session validation takes longer after the job submission and then by the time squeue command is executed the job is no longer in the queue (already completed) hence error returned [1]

      [1]
      2017-05-02 06:27:48,047 [pool-7-thread-15] ERROR o.a.a.g.i.t.DefaultJobSubmissionTask process_id=PROCESS_c7e404ed-0822-404a-8f04-6b09e9ba8ece, token_id=75918c63-30fd-4548-a8d3-7f3a41b185ae, experiment_id=US3-AIRA_740b0ad6-62c4-42dc-9eed-f12b92a6b98b, gateway_id=Ultrascan_Production - Error occurred while submitting the job
      org.apache.airavata.gfac.core.GFacException: Error running command squeue -j 9119082 on remote cluster. StandardError: slurm_load_jobs error: Invalid job id specified

      at org.apache.airavata.gfac.impl.HPCRemoteCluster.throwExceptionOnError(HPCRemoteCluster.java:298)
      at org.apache.airavata.gfac.impl.HPCRemoteCluster.getJobStatus(HPCRemoteCluster.java:233)
      at org.apache.airavata.gfac.impl.task.DefaultJobSubmissionTask.verifyJobSubmissionByJobId(DefaultJobSubmissionTask.java:302)
      at org.apache.airavata.gfac.impl.task.DefaultJobSubmissionTask.execute(DefaultJobSubmissionTask.java:157)
      at org.apache.airavata.gfac.impl.GFacEngineImpl.executeTask(GFacEngineImpl.java:814)
      at org.apache.airavata.gfac.impl.GFacEngineImpl.executeJobSubmission(GFacEngineImpl.java:510)
      at org.apache.airavata.gfac.impl.GFacEngineImpl.executeTaskListFrom(GFacEngineImpl.java:386)
      at org.apache.airavata.gfac.impl.GFacEngineImpl.executeProcess(GFacEngineImpl.java:286)
      at org.apache.airavata.gfac.impl.GFacWorker.executeProcess(GFacWorker.java:227)
      at org.apache.airavata.gfac.impl.GFacWorker.run(GFacWorker.java:86)
      at org.apache.airavata.common.logging.MDCUtil.lambda$wrapWithMDC$0(MDCUtil.java:40)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)

      Attachments

        Activity

          People

            smarru Suresh Marru
            eroma_a Eroma
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: