Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Hi,
I am running Gobblin on Hadoop. I have my own writer and publisher class. Each writer collects some stats (e.g. number of processed records). I just would like to ask if the publisher can get such stats from each writer.
Also, it there any internal Gobblin property to record the execution time of each task unit and the total execution time of Gobblin job?
Thanks
Github Url : https://github.com/linkedin/gobblin/issues/1453
Github Reporter : jeffwang66
Github Created At : 2016-12-09T03:55:52Z
Github Updated At : 2017-02-01T11:30:29Z
Comments
wosiu wrote on 2017-02-01T11:30:29Z : To record execution time of each task you can extend:
https://github.com/linkedin/gobblin/blob/7141ec88c255c8c3cbc7054fb8146eebe77fc09d/gobblin-metrics-libs/gobblin-metrics-base/src/main/java/gobblin/metrics/reporter/ScheduledReporter.java
To create custom reporter (https://gobblin.readthedocs.io/en/latest/metrics/Metrics-for-Gobblin-ETL/#custom-reporters). Each mapper gets one of these, so you can System.time in startImpl() and measure diff in stopImpl so that you've got time per task. You'e got there nice metrics too.
However I'm also interested how to measure whole job time within gobblin. Now I'm just taking hadoop job time using yarn client outside gobblin.
Github Url : https://github.com/linkedin/gobblin/issues/1453#issuecomment-276635272