[TIKA-1297] Images not being extracted from PDFs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.5
Fix Version/s: 1.6
Component/s: parser
Labels:
None

Description

Images embedded within PDF documents are not being extracted by Tika. I have tested this via the command line (where the -z option fails to extract any images), and by inspecting the XHTML version of the PDF produced by Tika (where the image tags are not included in the output).

The images are extractable by PDFBox, so Tika should be able to extract them and include them in the XHTML output.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: James Baker

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 13/May/14 08:35

Updated:: 24/Sep/14 11:46

Resolved:: 24/Sep/14 11:46