Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.8.3
-
None
Description
I found in a PDF I was pulling text from by using a custom PDFTextStripper subclass that overrides writeString(String text, List<TextPosition> textPositions) that i was getting the wrong textPositions that were not lined up with the text. I found that the test position of all “words” in a line always come over as the “last” text positions of the last word in the line. I found the issue in the PDFTextStripper class
So here is the Code Issue:
/**
- Used within
{@link #normalize(List, boolean, boolean)} to handle a {@link TextPosition}.
* @return The StringBuilder that must be used when calling this method.
*/
private StringBuilder normalizeAdd(LinkedList<WordWithTextPositions> normalized,
StringBuilder lineBuilder, List<TextPosition> wordPositions, TextPosition text)
{
if (text instanceof WordSeparator)
{ normalized.add(createWord(lineBuilder.toString(), wordPositions)); lineBuilder = new StringBuilder(); wordPositions.clear(); }
else
{ lineBuilder.append(text.getCharacter()); wordPositions.add(text); }
return lineBuilder;
}
When the normalizeAdd method, you create a new word passing the wordPositions. A reference to the wordPositions is stored in the new WordWithTextPositions in the normalized linked list, but in the next line, you clear(). Since the last wordPositions was passed as a reference, the wordPositions is cleared in the WordWithTextPositions you just created.
Soo, i would suggest you do the following:
/**
* Used within {@link #normalize(List, boolean, boolean)}to handle a
{@link TextPosition}.
- @return The StringBuilder that must be used when calling this method.
*/
private StringBuilder normalizeAdd(LinkedList<WordWithTextPositions> normalized,
StringBuilder lineBuilder, List<TextPosition> wordPositions, TextPosition text)Unknown macro: { if (text instanceof WordSeparator) { normalized.add(createWord(lineBuilder.toString(), new ArrayList<TextPosition>(wordPositions))); lineBuilder = new StringBuilder(); wordPositions.clear(); } else { lineBuilder.append(text.getCharacter()); wordPositions.add(text); } return lineBuilder; }