Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4508

Unexpected slowness filling forms with CJK

Details

    Description

      As reported by William Pietri in the users mailing list:

      If I go through this and fill every field with roman text using the default font, it takes circa 2 seconds, which is fine. If I fill it with an added Arabic font, it takes circa 7 seconds. And if I use a CJK font, it takes circa 140 seconds, which seems like a lot. This is with PDFBox 2.0.14 and the Oracle 1.8.201 JDK on Linux.

      And the message "OpenType Layout tables used in font NotoSansCJKsc-Medium are not implemented in PDFBox and will be ignored " comes up for every field, which suggests that the font is opened each time.

      I can confirm this, also with a different font (ArialUni).

      PDDocument doc = PDDocument.load(new File("i-130.pdf"));
      PDResources res = doc.getDocumentCatalog().getAcroForm().getDefaultResources();
      PDFont font = PDType0Font.load(doc, new FileInputStream(new File("NotoSansCJKsc-Medium.ttf")), false);
      String fontName = res.add(font).getName();
      long start = System.currentTimeMillis();
      for (Iterator<PDField> it = doc.getDocumentCatalog().getAcroForm().getFieldIterator(); it.hasNext();)
      {
          PDField field = it.next();
          if (field instanceof PDTextField)
          {
              PDTextField textField = (PDTextField) field;
              textField.setDefaultAppearance("/" + fontName + " 0 Tf 0 g");
              textField.setValue("中国");
              long end = System.currentTimeMillis();
              System.out.println("Filled " + textField.getFullyQualifiedName() + " in " + (end - start) + "ms");
              start = end;
          }
      }
      doc.save(new File("i-130-filled.pdf"));
      

      Attachments

        1. i-130.pdf
          1.75 MB
          Tilman Hausherr
        2. NotoSansCJKsc-Medium.ttf
          18.95 MB
          Tilman Hausherr

        Activity

          tilman Tilman Hausherr added a comment - - edited

          I looked at it with the profiler, which showed that the font is created each time. Then I looked at it with the debugger, PDFormXObject is initialized without resource cache, because PDAppearanceStream doesn't pass it (it can't).

          The (new) font in /DR doesn't get cached because it is not an indirect object. So the solution to make it faster is to save the file and reload it, so that the new font is an indirect object in /DR.

          One could also improve the method PDResources.put(COSName kind, COSName name, COSObjectable object) by changing its last lime to

          dict.setItem(name, new COSObject(object.getCOSObject()));
          

          However this creates a COSObject with key (0, 0). I haven't found how to allocate a new, unique COSObject. But it also speeds up the initial code.

          tilman Tilman Hausherr added a comment - - edited I looked at it with the profiler, which showed that the font is created each time. Then I looked at it with the debugger, PDFormXObject is initialized without resource cache, because PDAppearanceStream doesn't pass it (it can't). The (new) font in /DR doesn't get cached because it is not an indirect object. So the solution to make it faster is to save the file and reload it, so that the new font is an indirect object in /DR. One could also improve the method PDResources.put(COSName kind, COSName name, COSObjectable object) by changing its last lime to dict.setItem(name, new COSObject(object.getCOSObject())); However this creates a COSObject with key (0, 0). I haven't found how to allocate a new, unique COSObject. But it also speeds up the initial code.

          The trick with the COSObject doesn't work (build tests fail). The reason is that now identical objects are added twice or more. I'll do nothing further, I'll improve the javadoc and will mention it in the two SO issues that explain how to change fonts.

          tilman Tilman Hausherr added a comment - The trick with the COSObject doesn't work (build tests fail). The reason is that now identical objects are added twice or more. I'll do nothing further, I'll improve the javadoc and will mention it in the two SO issues that explain how to change fonts.

          Commit 1857072 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
          [ https://svn.apache.org/r1857072 ]

          PDFBOX-4508: improve javadoc

          jira-bot ASF subversion and git services added a comment - Commit 1857072 from Tilman Hausherr in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1857072 ] PDFBOX-4508 : improve javadoc

          Commit 1857073 from Tilman Hausherr in branch 'pdfbox/trunk'
          [ https://svn.apache.org/r1857073 ]

          PDFBOX-4508: improve javadoc

          jira-bot ASF subversion and git services added a comment - Commit 1857073 from Tilman Hausherr in branch 'pdfbox/trunk' [ https://svn.apache.org/r1857073 ] PDFBOX-4508 : improve javadoc

          Commit 1857074 from Tilman Hausherr in branch 'pdfbox/trunk'
          [ https://svn.apache.org/r1857074 ]

          PDFBOX-4508: add comment

          jira-bot ASF subversion and git services added a comment - Commit 1857074 from Tilman Hausherr in branch 'pdfbox/trunk' [ https://svn.apache.org/r1857074 ] PDFBOX-4508 : add comment

          Commit 1857075 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
          [ https://svn.apache.org/r1857075 ]

          PDFBOX-4508: add comment

          jira-bot ASF subversion and git services added a comment - Commit 1857075 from Tilman Hausherr in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1857075 ] PDFBOX-4508 : add comment

          People

            tilman Tilman Hausherr
            tilman Tilman Hausherr
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: