Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.23
-
None
-
None
-
Docker image: apache/tika:1.23
Description
Both /rmeta/text and unpack/all return null bytes in metadata.
Note "pdf:docinfo:producer": "Adobe PSL 1.2e for Canon\u0000"
$ curl -T Technical_manual.pdf http://localhost:9998/rmeta/text [{ "Content-Type": "application/pdf", "Creation-Date": "2018-08-21T09:40:33Z", "X-Parsed-By": [ "org.apache.tika.parser.DefaultParser", "org.apache.tika.parser.pdf.PDFParser" ], "X-TIKA:embedded_depth": "0", "X-TIKA:parse_time_millis": "42", "access_permission:assemble_document": "true", "access_permission:can_modify": "true", "access_permission:can_print": "true", "access_permission:can_print_degraded": "true", "access_permission:extract_content": "true", "access_permission:extract_for_accessibility": "true", "access_permission:fill_in_form": "true", "access_permission:modify_annotations": "true", "dc:format": "application/pdf; version\u003d1.4", "dcterms:created": "2018-08-21T09:40:33Z", "meta:creation-date": "2018-08-21T09:40:33Z", "pdf:PDFVersion": "1.4", "pdf:charsPerPage": [ "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0" ], "pdf:docinfo:created": "2018-08-21T09:40:33Z", "pdf:docinfo:creator_tool": "Canon iR-ADV C5235 PDF", "pdf:docinfo:producer": "Adobe PSL 1.2e for Canon\u0000", "pdf:encrypted": "false", "pdf:hasXFA": "false", "pdf:hasXMP": "true", "pdf:unmappedUnicodeCharsPerPage": [ "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0" ], "xmp:CreatorTool": "Canon iR-ADV C5235 PDF", "xmpMM:DocumentID": "uuid:03e07b5b-0000-f481-39c4-e94700000000", "xmpTPg:NPages": "31" }]
Other example.
Note fields "pdf:docinfo:creator_tool": "DigiPath\u0000", "pdf:docinfo:producer": "DigiPath\u0000" and "xmp:CreatorTool": "DigiPath\u0000"
[{ "Content-Type": "application/pdf", "Last-Modified": "2006-03-02T08:53:15Z", "Last-Save-Date": "2006-03-02T08:53:15Z", "X-Parsed-By": [ "org.apache.tika.parser.DefaultParser", "org.apache.tika.parser.pdf.PDFParser" ], "X-TIKA:embedded_depth": "0", "X-TIKA:parse_time_millis": "96", "access_permission:assemble_document": "true", "access_permission:can_modify": "true", "access_permission:can_print": "true", "access_permission:can_print_degraded": "true", "access_permission:extract_content": "true", "access_permission:extract_for_accessibility": "true", "access_permission:fill_in_form": "true", "access_permission:modify_annotations": "true", "date": "2006-03-02T08:53:15Z", "dc:format": "application/pdf; version\u003d1.3", "dcterms:modified": "2006-03-02T08:53:15Z", "meta:save-date": "2006-03-02T08:53:15Z", "modified": "2006-03-02T08:53:15Z", "pdf:PDFVersion": "1.3", "pdf:charsPerPage": [ "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0" ], "pdf:docinfo:creator_tool": "DigiPath\u0000", "pdf:docinfo:modified": "2006-03-02T08:53:15Z", "pdf:docinfo:producer": "DigiPath\u0000", "pdf:encrypted": "false", "pdf:hasXFA": "false", "pdf:hasXMP": "false", "pdf:unmappedUnicodeCharsPerPage": [ "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0" ], "xmp:CreatorTool": "DigiPath\u0000", "xmpTPg:NPages": "14" }]