Details
-
Task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Fix for #882
Github Url : https://github.com/linkedin/gobblin/pull/965
Github Reporter : jbaranick
Github Assignee : jbaranick
Github Created At : 2016-05-05T22:04:54Z
Github Updated At : 2017-04-22T18:44:42Z
Comments
jbaranick wrote on 2016-05-05T22:06:22Z : @sahilTakiar @zliu41: Can you review this?
Github Url : https://github.com/linkedin/gobblin/pull/965#issuecomment-217293611
coveralls wrote on 2016-05-05T22:21:08Z : [![Coverage Status](https://coveralls.io/builds/6068577/badge)](https://coveralls.io/builds/6068577)
Coverage increased (+0.5%) to 45.026% when pulling *193dddb831475f931999a9aca54c5c00e2d082d3 on kadaan:FixFor882* into *41963701538ae90ed8042c8d34a2ed7211a9af42 on linkedin:master*.
Github Url : https://github.com/linkedin/gobblin/pull/965#issuecomment-217296762
zliu41 wrote on 2016-05-06T16:07:44Z : @kadaan could you please give a brief description of your approach? It seems you are still using `current.jst`, which is a different approach than #882.
Github Url : https://github.com/linkedin/gobblin/pull/965#issuecomment-217485933
jbaranick wrote on 2016-05-06T16:35:48Z : `current.jst` is not used. There is a compromise here so that users of the API aren't broken. New callers can call `getCurrent` or `getAllCurrent` to get the latest state. If they want a specific state they can continue to call `get` or `getAll`. If `current` or `current.jst` is requested when calling `get` or `getAll` it will return the latest state just like `getCurrent` and `getAllCurrent`. A precondition will ensure that users of the API are not able to write a file named `current` or `current.jst`.
Github Url : https://github.com/linkedin/gobblin/pull/965#issuecomment-217492590
coveralls wrote on 2016-05-06T16:57:15Z : [![Coverage Status](https://coveralls.io/builds/6078675/badge)](https://coveralls.io/builds/6078675)
Coverage increased (+0.1%) to 45.096% when pulling *e5f0498095f1f6bcbf25e3ed0316ffd772275d73 on kadaan:FixFor882* into *588d8c77fe3c84c752fd410f916868419c178465 on linkedin:master*.
Github Url : https://github.com/linkedin/gobblin/pull/965#issuecomment-217497736
coveralls wrote on 2016-05-12T07:47:06Z : [![Coverage Status](https://coveralls.io/builds/6149251/badge)](https://coveralls.io/builds/6149251)
Coverage increased (+0.1%) to 46.765% when pulling *67e66222cd441d903b4197d380df8041cce2cc9d on kadaan:FixFor882* into *5cd9d969f73456e46847c9d9e7ef33ad5376617c on linkedin:master*.
Github Url : https://github.com/linkedin/gobblin/pull/965#issuecomment-218684160
jbaranick wrote on 2016-05-12T20:52:47Z : @zliu41 @sahilTakiar Can you guys finish this review?
Github Url : https://github.com/linkedin/gobblin/pull/965#issuecomment-218882376
jbaranick wrote on 2016-06-01T15:05:13Z : @zliu41 @sahilTakiar Can you guys finish this review?
Github Url : https://github.com/linkedin/gobblin/pull/965#issuecomment-223022088
coveralls wrote on 2016-06-01T15:33:05Z : [![Coverage Status](https://coveralls.io/builds/6419146/badge)](https://coveralls.io/builds/6419146)
Coverage increased (+0.07%) to 46.308% when pulling *668262a91516d9919a1cd30c141b058514890c8e on kadaan:FixFor882* into *fe7dc7c35eebc3a4faee9987ecccaae3511118c5 on linkedin:master*.
Github Url : https://github.com/linkedin/gobblin/pull/965#issuecomment-223031389
zliu41 wrote on 2016-06-01T19:09:53Z : @pcadabam @ibuenros @ydai1124 can you review this PR? Thanks
Github Url : https://github.com/linkedin/gobblin/pull/965#issuecomment-223094308
coveralls wrote on 2016-06-01T20:12:06Z : [![Coverage Status](https://coveralls.io/builds/6423539/badge)](https://coveralls.io/builds/6423539)
Coverage increased (+0.2%) to 46.398% when pulling *668262a91516d9919a1cd30c141b058514890c8e on kadaan:FixFor882* into *fe7dc7c35eebc3a4faee9987ecccaae3511118c5 on linkedin:master*.
Github Url : https://github.com/linkedin/gobblin/pull/965#issuecomment-223110224
jbaranick wrote on 2016-06-21T20:49:48Z : @pcadabam @ibuenros @ydai1124 Were any of you able to review this?
Github Url : https://github.com/linkedin/gobblin/pull/965#issuecomment-227567417
jbaranick wrote on 2017-01-17T16:37:42Z : @abti @pcadabam @ibuenros @ydai1124 Can any of you check this out now?
Github Url : https://github.com/linkedin/gobblin/pull/965#issuecomment-273222674
jbaranick wrote on 2017-01-24T17:45:02Z : @ydai1124 Can you please review again?
Github Url : https://github.com/linkedin/gobblin/pull/965#issuecomment-274879786
jbaranick wrote on 2017-02-06T20:23:17Z : @htran1 In response to your general comment:
> General comment is that StateStore should not know about 'current'...
> The StateStore should exposed APIs used at the DatasetStateStore level to access the last modified state.
The StateStore does expose an API to access current: `getCurrent` and `getAllCurrent`. Additionally, the `createAlias` API has been removed. In order to support a seamless upgrade, StateStore, also supports the prior `get*` calls being called with a filename like `current`. In this case it will perform the same as the `getCurrent`/`getAllCurrent` calls. The logic behind this is that I cannot be sure that there are no dependencies from third-party libraries on the existing `get*` calls. By making the change in this way, we can get rid of the `current` aliased files, while still ensuring existing code will work. We can publish in release note that callers need to upgrade to the latest API.
> ... DatasetStateStore should not query directly.
If desired, the logic of `MysqlDatasetStateStore.getLatestDatasetStatesByUrns` can be pushed down into `MysqlStateStore`, if you feel strongly about that.
Github Url : https://github.com/linkedin/gobblin/pull/965#issuecomment-277801997
jbaranick wrote on 2017-02-06T22:43:21Z : @htran1 Can you review this again?
Github Url : https://github.com/linkedin/gobblin/pull/965#issuecomment-277838696
jbaranick wrote on 2017-02-07T00:38:05Z : @htran1 I don't think that test failure is due to my PR. It seems to be a problem with line 197 in `GobblinClusterKillTest`: ` _clusterWorkers[0].disconnectHelixManager();`.
Github Url : https://github.com/linkedin/gobblin/pull/965#issuecomment-277861767
jbaranick wrote on 2017-02-09T17:01:39Z : @htran1 Can you review this again?
Github Url : https://github.com/linkedin/gobblin/pull/965#issuecomment-278704409
jbaranick wrote on 2017-02-14T06:44:03Z : @htran1 Can you please review this again?
Github Url : https://github.com/linkedin/gobblin/pull/965#issuecomment-279621932
jbaranick wrote on 2017-02-16T17:32:22Z : @htran1 Any more feedback?
Github Url : https://github.com/linkedin/gobblin/pull/965#issuecomment-280400680
jbaranick wrote on 2017-02-16T23:00:49Z : @ibuenros Any feedback?
Github Url : https://github.com/linkedin/gobblin/pull/965#issuecomment-280491562
jbaranick wrote on 2017-02-26T06:33:06Z : @htran1 I understand what you are saying, but I believe that the current.jst is the cause for many problems. It was discussed a year ago and recommended to be removed. One big issue is that depending on the backing FS, the update of the current file contents is eventual. This can lead to incorrect state being returned.
Regarding `#1` above, do you have any numbers that show what the performance is for listing the state files? Also, what state store implementation are you using. If it is the FS version, what backing FS is it?
For `#2` I would recommend that we have a different mechanism for moving to a prior state. This is a good feature, but rewriting the current file seems to be the wrong was to go about this.
As for `#3`, it seems that the need to write a current pointer or not is totally dependent on the filesystem which is used to store the state. Why not get rid of the concept of current outside of the state store and allow implementations to use it or not. At least that way, I can make an implementation of FsStateStore that doesn't use current and the builtin version can.
Github Url : https://github.com/linkedin/gobblin/pull/965#issuecomment-282536560
hutran wrote on 2017-02-26T08:58:35Z : For 1, we are using FS state store on HDFS. I'm less concerned about this case than case 3 on NFS. As long as we retain only a small number of jst files per dataset it will probably be okay since it will probably add at most a few seconds to the startup of a job.
For 2, I'll have to check with the team since this is the current method of temporarily moving state back.
3 is a concern for us since the NFS system we are on is less scalable than HDFS. I'm seeing delete of small files giving only 50 ops/s. Listing should be faster, but with many concurrent jobs starting it may saturate our NFS filer. We only have a single master node in this mode that does job planning, so saving CPU cycles is more essential.
The base state store interface doesn't know about current.jst (at least before your changes). We have other uses of StateStore that stores non .jst files. The current.jst is a DatasetStateStore concept. So maybe it makes more sense to add the functionality you need to a new variation of FsDatasetStateStore that doesn't make use of the current file. I think you can even add it as a config option to the current FsDatasetStateStore since it should only be a few blocks that are different. In either case, the mysql and zk dataset state stores won't need to be touched. This allows us to continue using current.jst while users of S3 can use the new approach.
Github Url : https://github.com/linkedin/gobblin/pull/965#issuecomment-282542372