Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Abandoned
-
2.2.2, 2.3
-
None
Description
Investigate / implement optimizations that trade user-controllable time (running the optimizations) for space. One such optimization could be: sharing strings. To do the sharing requires additional computation and (temporary) storage to detect the sharing opportunities, but results in space savings. For instance, a common annotation might assign short strings like "noun" to a "part-of-speech" feature. If you are processing a large document, there may be a large number of these kinds of string valued features, picked from a small pool of allowable values. The CAS's string storage might be able to be optimized to share the string references in this case, at a cost of temporarily creating a hash table of the unique strings and using it to identify sharing possibilities. A new API call to do this optimization would isolate the performance/space overhead of doing this optimization to just those users and times where it makes sense to do this.
An alternative would be to automatically figure this out for some selected kinds of optimizations, but I'm not sure that could be done without impacting finely-tuned systems negatively.