[OAK-858] NodeBuilder.getChildNodeCount performance and scalability - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.9
Component/s: core
Labels:
None

Description

The method NodeBuilder.getChildNodeCount() is currently supposed to return the exact number of child nodes of a node. If there are no or only few child nodes (which is the most common case), this isn't a problem.

However, if there are many nodes (thousands, maybe millions), then keeping an accurate and up-to-date count is tricky. It is specially tricky in a cluster, if you want to allow concurrent add node / delete node. This is for example needed for the UUID index currently. There would be a way to avoid concurrent add/remove: by using some hierarchy, that is, avoid using many child nodes. But efficient, scalable support for many child nodes is one of the goals of Oak in my view.

I think it's not worth the effort to support efficient, accurate child node counts if there are many child nodes. Instead, I suggest to change the contract, and possibly even change the method.

The current usages of the method NodeBuilder.getChildNodeCount(), excluding usage within getChildNodeCount itself, toString(), and tests:

AbstractNodeState.equals, where it's used to avoid iterating over all child nodes if possible. But it doesn't always avoid iterating over all child nodes, so this method anyway is problematic. I even suggest to remove it (or throw an exception) because of the potential performance problem if there are many child nodes.

Template constructor, where there are only 3 cases: 0, 1, and many child nodes.

EmptyNodeState.equals, where there are only 2 cases: 0 and non-0.

SecureNodeState.WrapChildEntryFunction.apply, where there are only 2 cases: 0 and non-0.

Because of that, in theory we could simply change the contract of the method to return only "0, 1, Long.MAX_VALUE". However this seems dangerous.

Instead, I see two options:

change the method to return an enum: NO, ONE, MANY.

change the method to NodeBuilder.getChildNodeCount(long max), where max is the maximum returned value. So that a typical method call would be getChildNodeCount(1) if you only care about 0 or non-0.

Attachments

Issue Links

is depended upon by

OAK-934 NodeIterator.getSize() can return Long.MAX_VALUE for the MongoMK

Resolved

relates to

OAK-1930 New method RangeIterator.getSize(int max)

Open

Activity

People

Assignee:: Thomas Mueller

Reporter:: Thomas Mueller

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 06/Jun/13 09:51

Updated:: 30/Jun/14 14:39

Resolved:: 06/Aug/13 10:23