Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-3132

Instrument SLACalculatorMemory

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 4.3.0
    • 5.0.0b1
    • core
    • None

    Description

      When there are lots of WorkflowJobBean and CoordinatorJobBean instances that have to be followed up on creating SLASummaryBean instances, following can occur:

      • we set oozie.sla.service.SLAService.capacity to a sane value like 10000 to preserve heap consumption
      • SLACalculatorMemory#addRegistration() and SLACalculatorMemory#updateRegistration would:
        • either emit TRACE level logs like SLA Registration Event - Job: showing the add / update of SLARegistrationBean was successful
        • or emit ERROR level logs like SLACalculator memory capacity reached. Cannot add or update new SLA Registration entry for job showing the add / update of SLARegistrationBean was not successful

      Since sometimes stale or already processed SLAEvent entries from SLACalculatorMemory#slaMap get removed, it's pretty hard to say what is its the actual size - that is, whether the next add or update command will succeed

      We need an Instrumentation.Counter instance that gets incremented when there is an SLACalculatorMemory#slaMap#put() with a new entry added, and gets decremented when there happens a SLACalculatorMemory#slaMap#remove() with an existing entry removed. This counter will be automatically present within REST interface, and Oozie client.

      Attachments

        1. OOZIE-3132.003.patch
          17 kB
          Andras Piros
        2. OOZIE-3132.002.patch
          17 kB
          Andras Piros
        3. OOZIE-3132.001.patch
          15 kB
          Andras Piros

        Activity

          People

            andras.piros Andras Piros
            andras.piros Andras Piros
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: