Uploaded image for project: 'Apache Gobblin'
  1. Apache Gobblin
  2. GOBBLIN-70

[Help Needed] Print out some stats from Gobblin

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      Hi,

      I am running Gobblin on Hadoop. I have my own writer and publisher class. Each writer collects some stats (e.g. number of processed records). I just would like to ask if the publisher can get such stats from each writer.

      Also, it there any internal Gobblin property to record the execution time of each task unit and the total execution time of Gobblin job?

      Thanks

      Github Url : https://github.com/linkedin/gobblin/issues/1453
      Github Reporter : jeffwang66
      Github Created At : 2016-12-09T03:55:52Z
      Github Updated At : 2017-02-01T11:30:29Z

      Comments


      wosiu wrote on 2017-02-01T11:30:29Z : To record execution time of each task you can extend:
      https://github.com/linkedin/gobblin/blob/7141ec88c255c8c3cbc7054fb8146eebe77fc09d/gobblin-metrics-libs/gobblin-metrics-base/src/main/java/gobblin/metrics/reporter/ScheduledReporter.java
      To create custom reporter (https://gobblin.readthedocs.io/en/latest/metrics/Metrics-for-Gobblin-ETL/#custom-reporters). Each mapper gets one of these, so you can System.time in startImpl() and measure diff in stopImpl so that you've got time per task. You'e got there nice metrics too.
      However I'm also interested how to measure whole job time within gobblin. Now I'm just taking hadoop job time using yarn client outside gobblin.

      Github Url : https://github.com/linkedin/gobblin/issues/1453#issuecomment-276635272

      Attachments

        Activity

          People

            Unassigned Unassigned
            abti Abhishek Tiwari
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: