Description
Hello,
I am having a strange problem deadling with embedded images.
This is my code:
public void getImages() throws IOException, TikaException, SAXException { try (InputStream stream = new FileInputStream(this.fileName)) { RecursiveParserWrapper p = new RecursiveParserWrapper( new AutoDetectParser(), new BasicContentHandlerFactory(BasicContentHandlerFactory.HANDLER_TYPE.IGNORE, -1) ); ParseContext context = new ParseContext(); PDFParserConfig config = new PDFParserConfig(); config.setExtractInlineImages(true); config.setExtractUniqueInlineImagesOnly(true); context.set(org.apache.tika.parser.pdf.PDFParserConfig.class, config); context.set(org.apache.tika.parser.Parser.class, p); p.parse(stream, new BodyContentHandler(-1), new Metadata(), context); List<Metadata> metadatas = p.getMetadata(); FileInputStream f = new FileInputStream("/tmp/" + metadatas.get(1).get("File Name")); //FileInputStream f = new FileInputStream(metadatas.get(1).get("File Name")); System.out.println(f.available()); } }
I can get the name of the embedded images with get("File Name") but the path seems invalid.
I need to save all the embedded images (inline images) to another location.
Thank you in advance!