[TIKA-1376] Improve embedded file name extraction in PDFParser - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Trivial
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.6
Component/s: parser
Labels:
None

Description

When we extract embedded files from PDFs, we are currently using the key in the PDEmbeddedFilesNameTreeNode as the file name that we store as the value of Metadata.RESOURCE_NAME_KEY in the embedded document's metadata.

I think we should try to get the file name from PDComplexFileSpecification's getFilename() first. If that is null, then we should fall back to the key value.

Attachments

Activity

People

Assignee:: Tim Allison

Reporter:: Tim Allison

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 24/Jul/14 15:16

Updated:: 26/Jul/14 08:29

Resolved:: 25/Jul/14 15:02