Uploaded image for project: 'ManifoldCF'
  1. ManifoldCF
  2. CONNECTORS-1494

Error crawling file system with file names having special characters.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Won't Fix
    • ManifoldCF 2.9.1
    • ManifoldCF 2.10
    • File system connector
    • None

    Description

      I am crawling a file system mounted on linux machine. So the Repository Connection is of type "File System". For some files which has some special characters, Manifold Cf is not picking such files.

      File ex: a_XY-SMnA_ABC_Uuޓࠚϯmӣܼ˵Ҫȳ_֚3ҿؖúشԃԫхրҠë.pdf

      exception: java.lang.NumberFormatException: For input string: ""
          at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[?:1.8.0_151]
          at java.lang.Long.parseLong(Long.java:601) ~[?:1.8.0_151]
          at java.lang.Long.<init>(Long.java:965) ~[?:1.8.0_151]
          at org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter$SpecPacker.<init>(DocumentFilter.java:513) ~[?:?]
          at org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter.getPipelineDescription(DocumentFilter.java:76) ~[?:?]
          at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getTransformationDescription(IncrementalIngester.java:503) ~[mcf-agents.jar:?]
          at org.apache.manifoldcf.crawler.system.PipelineSpecification.<init>(PipelineSpecification.java:47) ~[mcf-pull-agent.jar:?]
          at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:308) [mcf-pull-agent.jar:?]
      FATAL 2018-02-07T23:47:15,927 (Worker thread '2') - Error tossed: For input string: ""

      Attachments

        Activity

          People

            kwright@metacarta.com Karl Wright
            vinaybs.20@gmail.com Vinay
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: