Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
2.4.0SDK
-
None
-
None
-
None
Description
REPRO: using a tokenizer that matches on "[^ ]" on "aaa bbb ccc ddd" I get four token annotations
"aaa" 0-3
"bbb" 4-7
"ccc" 8-11
"ddd" 12-15
I then iterate over the token annotations while printing the covered text, begin and end, make an unambiguous non-strict subiterator, and iterate over the subiterations printing out their covered text, begin and end all indented.
Iterator<Annotation> iter = jcas.getAnnotationIndex(Token.type).iterator();
while (iter.hasNext()) {
Annotation a = iter.next();
System.out.println("\"" + a.getCoveredText() + "\"" + " [" + a.getBegin() + ", " + a.getEnd() + ")");
Iterator<Annotation> featIter = jcas.getAnnotationIndex().subiterator(a, false, false);
while (featIter.hasNext())
}
The output is
"aaa" [0, 3)
"bbb" [4, 7)
"bbb" [4, 7)
"ccc" [8, 11)
"ccc" [8, 11)
"ddd" [12, 15)
"ddd" [12, 15)
I think this can be fixed by adding an extra check at Subiterator.java ln: 127
NOW
while (it.isValid() && ((start > annot.getBegin()) || (strict && annot.getEnd() > end)))
POSSIBLE FIX
while (it.isValid() && ((start > annot.getBegin() && annot.getBegin() <= end) || (strict && annot.getEnd() > end))) { it.moveToNext(); }
Attachments
Issue Links
- is duplicated by
-
UIMA-2808 JCasUtil Subiterator returns annotations which are not within borders of the container (parent) annotation if parameter "strict" is set to "false"
- Resolved