Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-1984

SLACalculator in HA mode performs duplicate operations on records with completed jobs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • trunk
    • 4.1.0
    • None
    • None

    Description

      Scenario:

      SLA periodic run has already processed start,duration and end for a job's sla entry. But job notification for that job came after this, and triggers the sla listener.

      Buggy part:

      SLACalculatorMemory.java
      
      else if (Services.get().get(JobsConcurrencyService.class).isHighlyAvailableMode()) {
                      // jobid might not exist in slaMap in HA Setting
                      SLARegistrationBean slaRegBean = SLARegistrationQueryExecutor.getInstance().get(
                              SLARegQuery.GET_SLA_REG_ALL, jobId);
                      if (slaRegBean != null) { // filter out jobs picked by SLA job event listener
                                                // but not actually configured for SLA
                          SLASummaryBean slaSummaryBean = SLASummaryQueryExecutor.getInstance().get(
                                  SLASummaryQuery.GET_SLA_SUMMARY, jobId);
                          slaCalc = new SLACalcStatus(slaSummaryBean, slaRegBean);
                          if (slaCalc.getEventProcessed() < 7) {
                              slaMap.put(jobId, slaCalc);
                          }
                      }
                  }
              }
              if (slaCalc != null) {
      ..
      Object eventProcObj = ((SLASummaryQueryExecutor) SLASummaryQueryExecutor.getInstance())
                                      .getSingleValue(SLASummaryQuery.GET_SLA_SUMMARY_EVENTPROCESSED, jobId);
                              byte eventProc = ((Byte) eventProcObj).byteValue();
      ..
      processJobEndSuccessSLA(slaCalc, startTime, endTime);
      

      method processJobEndSuccesSLA goes ahead and checks second LSB bit of eventProc and sends duration event again. So the bug here is two-fold:

      • if all events are already processed, still invokes this function
      • event processed is 8 (1000), so second LSB bit is unset and hence duration processed.

      Fix - not invoke function when eventProc = 1000

      Attachments

        1. OOZIE-1984.patch
          1 kB
          Mona Chitnis
        2. OOZIE-1984-1.patch
          1.0 kB
          Mona Chitnis

        Issue Links

          Activity

            People

              Unassigned Unassigned
              chitnis Mona Chitnis
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: