Description
I am scanning Pptx using tika parser/core 2.6.0 version and using EmbeddedDocumentExtractor to verify if embedded images are present in pptx or not. It seems that metadata contains thumbnails with mime type as "image/jpeg". The key and value for thumbnail areĀ "dc:title" and "/docProps/thumbnail.jpeg" respectively. So even if there is no embedded image in pptx file, result always shows "Embedded image present" due to thumbnails. Is there any way to introduce any parameter in officeParserConfig that will skip the thumbnails while parsing . Thanks