Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
2.6.0SDK
-
None
Description
int - int maps and int sets are implemented in UIMA as either red-black trees (size = 5 words (4 words for sets) + 1 bit per item, search time = log 2 size (binary search), insert /removal can cause rebalancing tree), or as intVectors (like ArrayList<Integers> but doesn't wrap ints as Integers).
For int - int maps, add a hash version (loses key "ordering"), which takes 3 - 6 words per item (avg 4.5 words - slightly smaller), and has O(1) performance (based on existing JCasHashMap impl, but without concurrency support).
For int sets, add 2 styles: one based on the same hash map (space = 1.5 to 3 words per item vs 4 words), the other based on BitSets. The BitSet one would be faster, would have a conventional key ordering, and would be smaller, if the max int stored < 128 * number of items stored. (This is often the case in uses where it is used to track Feature Structure addresses).
Finally, add one adaptable int set that is usually is implemented as a BitSet, but if the items being stored are such that the size would be significantly larger than the IntHashSet implementation, switch to that alternative representation.