Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-2928 YARN Timeline Service v.2: alpha 1
  3. YARN-3901

Populate flow run data in the flow_run & flow activity tables

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.9.0, 3.0.0-alpha1
    • timelineserver
    • None
    • Reviewed

    Description

      As per the schema proposed in YARN-3815 in https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf

      filing jira to track creation and population of data in the flow run table.

      Some points that are being considered:

      • Stores per flow run information aggregated across applications, flow version
        RM’s collector writes to on app creation and app completion
      • Per App collector writes to it for metric updates at a slower frequency than the metric updates to application table
        primary key: cluster ! user ! flow ! flow run id
      • Only the latest version of flow-level aggregated metrics will be kept, even if the entity and application level keep a timeseries.
      • The running_apps column will be incremented on app creation, and decremented on app completion.
      • For min_start_time the RM writer will simply write a value with the tag for the applicationId. A coprocessor will return the min value of all written values. -
      • Upon flush and compactions, the min value between all the cells of this column will be written to the cell without any tag (empty tag) and all the other cells will be discarded.
      • Ditto for the max_end_time, but then the max will be kept.
      • Tags are represented as #type:value. The type can be not set (0), or can indicate running (1) or complete (2). In those cases (for metrics) only complete app metrics are collapsed on compaction.
      • The m! values are aggregated (summed) upon read. Only when applications are completed (indicated by tag type 2) can the values be collapsed.
      • The application ids that have completed and been aggregated into the flow numbers are retained in a separate column for historical tracking: we don’t want to re-aggregate for those upon replay

      Attachments

        1. YARN-3901-YARN-2928.9.patch
          173 kB
          Vrushali C
        2. YARN-3901-YARN-2928.8.patch
          163 kB
          Vrushali C
        3. YARN-3901-YARN-2928.7.patch
          162 kB
          Vrushali C
        4. YARN-3901-YARN-2928.6.patch
          158 kB
          Vrushali C
        5. YARN-3901-YARN-2928.5.patch
          150 kB
          Vrushali C
        6. YARN-3901-YARN-2928.4.patch
          148 kB
          Vrushali C
        7. YARN-3901-YARN-2928.3.patch
          135 kB
          Vrushali C
        8. YARN-3901-YARN-2928.2.patch
          132 kB
          Vrushali C
        9. YARN-3901-YARN-2928.10.patch
          174 kB
          Vrushali C
        10. YARN-3901-YARN-2928.1.patch
          129 kB
          Vrushali C

        Issue Links

          Activity

            People

              vrushalic Vrushali C
              vrushalic Vrushali C
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: