Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-1429

Add color information to TextPosition

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Minor
    • Resolution: Duplicate
    • 1.7.1, 2.0.0
    • None
    • Text extraction
    • None

    Description

      the Class org.apache.pdfbox.util.TextPosition offer just offer position of text in a page and limited Font info , (many chinese character not having FontDescriptor, so fontName and other style can not be retrieved. )
      I think many people use PDFBox to build a client util to extract text and image,
      and then reorginize the text and image to form a new article or book which will be read on ipad or mobile phone with the help of manual work to solve the layout ,
      but many book which have complex laout and color has so many page make this work need much human effort, if more work can be done automatically, it can be efficient.

      so ,if a Class named Text with precise position ,fontSize ,font style and color and other such as background color can easily getted.
      the process of Text extraction also including exclude unnessary text, make text more colorful , can be easier.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              panquanyi PanQuanyi
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 4h
                  4h
                  Remaining:
                  Remaining Estimate - 4h
                  4h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified