Uploaded image for project: 'ManifoldCF'
  1. ManifoldCF
  2. CONNECTORS-1681

TikaServiceRmeta: recordActivity can cause Database exception

    XMLWordPrintableJSON

Details

    Description

      Some files containing non UTF8 characters can cause Tika to trigger an exception describing the parsing problem. 
      As the TikaServiceRmeta connector creates an activity record for any Tika exception containing its description (and so that contains the non UTF8 char in those cases), it causes an SQL exception when MCF tries to insert the activity record in the Database:

      ERROR 2021-11-24T13:37:00,121 (Worker thread '41') - MCF|MCF-agent|apache.manifoldcf.crawlerthreads|Worker thread aborting and restarting due to database connection reset: Database exception: SQLException doing query (22021): ERROR: invalid byte sequence for encoding "UTF8": 0x00
      org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database exception: SQLException doing query (22021): ERROR: invalid byte sequence for encoding "UTF8": 0x00 

      So to avoid this, we need to remove those problematic chars from the exception description before recording the activity

       

      Attachments

        Activity

          People

            julienFL Julien Massiera
            julienFL Julien Massiera
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: