Uploaded image for project: 'UIMA'
  1. UIMA
  2. UIMA-1089

Space/Time tradeoffs in the CAS

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Abandoned
    • 2.2.2, 2.3
    • None
    • Core Java Framework

    Description

      Investigate / implement optimizations that trade user-controllable time (running the optimizations) for space. One such optimization could be: sharing strings. To do the sharing requires additional computation and (temporary) storage to detect the sharing opportunities, but results in space savings. For instance, a common annotation might assign short strings like "noun" to a "part-of-speech" feature. If you are processing a large document, there may be a large number of these kinds of string valued features, picked from a small pool of allowable values. The CAS's string storage might be able to be optimized to share the string references in this case, at a cost of temporarily creating a hash table of the unique strings and using it to identify sharing possibilities. A new API call to do this optimization would isolate the performance/space overhead of doing this optimization to just those users and times where it makes sense to do this.

      An alternative would be to automatically figure this out for some selected kinds of optimizations, but I'm not sure that could be done without impacting finely-tuned systems negatively.

      Attachments

        Activity

          People

            Unassigned Unassigned
            schor Marshall Schor
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: