Uploaded image for project: 'FOP'
  1. FOP
  2. FOP-2886

FOP 2.3 Generates Truncated/Corrupted PDF with Mathematical Unicode Characters

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 2.3
    • None
    • renderer/pdf
    • None
    • Reproduced on Ubuntu 18.04.3 LTS

    Description

      Overview:

      We use FOP 2.3 to generate PDFs based on HTML. We have found that the inclusion of a large number of certain Mathematical Unicode characters (such as https://www.compart.com/en/unicode/U+1D538 ) allows the PDF to be created without error, but the PDF generated cannot be opened by any PDF viewer.

      We also use Lowagie PdfReader to validate that the PDF we generate is well-formed. The PdfReader threw the following Exception:
       com.itextpdf.text.exceptions.InvalidPdfException: Rebuild failed: trailer not found.; Original message: PDF startxref not found.

      Manual inspection has revealed that the trailer has indeed not been included. We've seen this issue can occur when the input and output streams are not closed or flushed properly – in our case, we are using the Java try-with-resources pattern to invoke close() automatically, so I don't believe this is our issue. I have also tried in vain closing our streams manually, as well as switching the order in which the close() happens.

      Steps to Reproduce:

      I have not been able to reproduce outside of our software, unfortunately, but I've included the HTML that causes the problem (reproHtml.txt) and the .xsl files we use. This is the code snippet that we use to convert the input HTML into a ByteArrayOutputStream:

      public void generatePdfWithCssToXslFo(
              final String htmlString,
              final OutputStream outputStream
      ) throws CSSToXSLFOException, SAXException, IOException {
          try (final Reader htmlReader = new StringReader(htmlString)) {
              final InputSource source = new InputSource(htmlReader);
              final boolean isValidatingParser = false;
              final boolean cssToXslFoDebugEnabled = System.getProperty("be.re.css.debug") != null;
      
              // Setup FOP to take the xml:fo and turn it into a PDF
              final Fop fop;
              final FOUserAgent userAgent;
      
              FopFactoryBuilder builder = new FopFactoryBuilder(URI.create(resourceLoader.getResource(resourceBasePath).getURI().toString()), new ClasspathResolverURIAdapter());
              builder.setConfiguration(configuration);
      
              FopFactory factory = builder.build();
              userAgent = factory.newFOUserAgent();
              userAgent.setAuthor("Indeed");
              userAgent.setCreator("Indeed Resume");
              userAgent.setTitle("Indeed Resume");
              userAgent.setKeywords("Indeed Resume");
      
              fop = factory.newFop(MimeConstants.MIME_PDF, userAgent, outputStream);
      
              // Setup CSSToXSLFo as transform the XHTML output into xml:fo
              final URL baseUrl = resourceLoader.getResource(resourceBasePath).getURL();
              Loggers.debug(LOGGER, "Parsing HTML response using base URL '%s'", baseUrl);
              final XMLReader xmlParser = Util.getParser(null, isValidatingParser);
              final ProtectEventHandlerFilter eventHandlerFilter = new ProtectEventHandlerFilter(true, true, xmlParser);
      
              final XMLReader filter =
                      new CSSToXSLFOFilter(
                              baseUrl,
                              null,
                              Collections.EMPTY_MAP,
                              eventHandlerFilter,
                              cssToXslFoDebugEnabled);
      
              filter.setEntityResolver(classPathEntityResolver);
              filter.setContentHandler(fop.getDefaultHandler());
              filter.parse(source);
          }
      }

      Actual Results:

      The attached PDF is created (ActualResult.pdf)

      Expected Results:

      An intact PDF can be created. For example, I've attached ApproximateExpectedResult.pdf where I've replaced the first letter with my name, which allows the PDF to render.

      Build Date & Hardware: Date and hardware of the build in which you first encountered the bug.

      FOP version 2.3, Build 2014-07-15 on Ubuntu 18.04.3 LTS

      Additional Builds and Platforms: Whether or not the bug takes place on other platforms (or browsers, if applicable).

      (Unable to test on other platforms.)

      Additional Information: 

      As you can see in the ApproximateExpectedResult.pdf, there is a mix of these Mathematical characters and normal Latin letter characters. Adding additional Latin characters or removing any of the Mathematical characters can sometimes allow the PDF to render, but it's hard to predict - I was not able to link it to any particular character or word.

      Attachments

        1. ActualResult.pdf
          29 kB
          Lawrence Thibodeaux
        2. ApproximateExpectedResult.pdf
          31 kB
          Lawrence Thibodeaux
        3. fo_setup.xsl
          13 kB
          Lawrence Thibodeaux
        4. name2fo.xsl
          13 kB
          Lawrence Thibodeaux
        5. reproHtml.txt
          12 kB
          Lawrence Thibodeaux
        6. xhtml2fo.xsl
          62 kB
          Lawrence Thibodeaux

        Activity

          People

            Unassigned Unassigned
            thibodeaux Lawrence Thibodeaux
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: