Details
Description
Tilman's observation on 'Microsoft' below revealed 1) that we should use our BodyContentHandler so that title metadata doesn't slip into the body content and 2) the title and all metadata values from PDDocumentInformation is null for at least: NZ/NZAZKTQYKDD2HSBCSJJN6XSEA4KJEONU
Path p = Paths.get("..NZAZKTQYKDD2HSBCSJJN6XSEA4KJEONU");
PDDocument d = PDDocument.load(p.toFile());
assertNull(d.getDocumentInformation().getTitle());
assertEquals(8, d.getDocumentInformation().getMetadataKeys().size());
Manually reviewing a handful of documents in the metadata/metadata_value_count_diffs.csv file here, this looks to be quite pervasive...unless I'm botching the right way to load the documents and metadata.