Description
PDFBOX's date parser can throw a StringIndexOutOfBoundsException if a date string for parsing is empty or contains only spaces. A few of my test pdfs have this "feature."
Until PDFBOX-1803 is resolved, we can add an extra catch to prevent this from causing problems in TIKA
@@ -171,6 +171,9 @@ addMetadata(metadata, TikaCoreProperties.CREATED, info.getCreationDate()); } catch (IOException e) { // Invalid date format, just ignore + } catch (StringIndexOutOfBoundsException e){ + //remove after PDFBOX-1883 is fixed + // Invalid date format, just ignore } try { Calendar modified = info.getModificationDate(); @@ -178,6 +181,9 @@ addMetadata(metadata, TikaCoreProperties.MODIFIED, modified); } catch (IOException e) { // Invalid date format, just ignore + } catch (StringIndexOutOfBoundsException e){ + //remove after PDFBOX-1883 is fixed + // Invalid date format, just ignore }
Attachments
Issue Links
- depends upon
-
PDFBOX-2823 StringIndexOutOfBoundsException when doing DateConverter.parseDate()
- Closed
-
PDFBOX-1803 StringIndexOutOfBound on DateConverter.toCalendar
- Closed