Details
-
New Feature
-
Status: Done
-
Major
-
Resolution: Done
-
0.2.0
-
None
-
None
Description
In some situation, user might want to set different behavior based on `direction` of edge.
Based on my experience on deploying and operating S2Graph with user's news article click activity, It is extremely common that few of article get most of clicks.
More formal way to describe problem, let's say we have `user_article_click` label and each edge consist of `user_id` and `article_id` as source/target vertex.
In this case, 'out' direction edge spread out evenly because we are prepending murmur hash at the beginning of row key. we have very few edges per each source vertex(`user_id`) since each individual can't click million articles.
However 'in' direction, which hold all edges connecting all `user_id` for each `article_id` have different scenario. only few `article_id` get lots of click from million users and this quickly become the `super node`. This yield excessive region server resource usage and It is not reasonable million edges on one single source vertex anyway because it would be timeout to send million edges to client.
Currently, there is no way to control how to process edge per each direction, but above case can be avoided if we can provide options.
I suggest new feature to provide separate index with write options for each `direction`.
Possible write options can be followings(based on our write transaction steps).
- `IndexEdge`: dropAll/sampling/storeAll(default)
- `SnapshotEdge`: drop/store(default)
- `Degree`: ignore/update(default)
By enabling/disabling each element in write transaction, users can decide what to do when they know how their data will be.
Attachments
Issue Links
- links to