Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
ManifoldCF 1.1
-
None
Description
If a document has a name with a hash symbol (#) in it, and you try to ingest that into Solr via the Solr connector, SolrJ throws an IllegalArgumentException and the worker thread goes into an infinite loop.
FATAL 2013-01-30 17:46:13,664 (Worker thread '20') - Error tossed: Illegal character in query at index 537: http://localhost:8080/solr/Lisa/update/extract?literal.id=https%3A%2F%2Fopentextdev2.llan.ll.mit.edu%2Fcs%2Fllisapi.dll%3Ffunc%3Dll%26objID%3D1016599%26objAction%3Ddownload&literal.allow_token_document=LISA-Authority-DEV%3A1005367&literal.allow_token_document=LISA-Authority-DEV%3A68276&literal.allow_token_document=LISA-Authority-DEV%3A796642&literal.allow_token_document=LISA-Authority-DEV%3AGUEST&literal.allow_token_document=LISA-Authority-DEV%3ASYSTEM&literal.deny_token_document=LISA-Authority-DEV%3ADEAD_AUTHORITY&literal.Document Info:Keyword / Phrase=%3F&literal.general_creator=th23825&literal.Document Info:Performing Organization=%3F&literal.general_description=&literal.general_modifier=th23825&literal.general_creationdate=Wed+Nov+14+09%3A28%3A16+EST+2012&literal.Document Info:Document Date=%3F&literal.Document Info:Document Author(s)=%3F&literal.general_name=%23raodoc4.txt%3E&literal.ll_filename=%23raodoc4.txt%3E&literal.general_owner=th23825&literal.Document Info:Document Revision Notes=%3F&literal.Document Info:Data Classification=For+Laboratory+Use+Only+%28FLUO%29&literal.general_modifydate=Wed+Nov+14+09%3A28%3A16+EST+2012&literal.Document Info:Document Description=%3F&commitWithin=4000&wt=xml&version=2.2 java.lang.IllegalArgumentException: Illegal character in query at index 537: http://localhost:8080/solr/Lisa/update/extract?literal.id=https%3A%2F%2Fopentextdev2.llan.ll.mit.edu%2Fcs%2Fllisapi.dll%3Ffunc%3Dll%26objID%3D1016599%26objAction%3Ddownload&literal.allow_token_document=LISA-Authority-DEV%3A1005367&literal.allow_token_document=LISA-Authority-DEV%3A68276&literal.allow_token_document=LISA-Authority-DEV%3A796642&literal.allow_token_document=LISA-Authority-DEV%3AGUEST&literal.allow_token_document=LISA-Authority-DEV%3ASYSTEM&literal.deny_token_document=LISA-Authority-DEV%3ADEAD_AUTHORITY&literal.Document Info:Keyword / Phrase=%3F&literal.general_creator=th23825&literal.Document Info:Performing Organization=%3F&literal.general_description=&literal.general_modifier=th23825&literal.general_creationdate=Wed+Nov+14+09%3A28%3A16+EST+2012&literal.Document Info:Document Date=%3F&literal.Document Info:Document Author(s)=%3F&literal.general_name=%23raodoc4.txt%3E&literal.ll_filename=%23raodoc4.txt%3E&literal.general_owner=th23825&literal.Document Info:Document Revision Notes=%3F&literal.Document Info:Data Classification=For+Laboratory+Use+Only+%28FLUO%29&literal.general_modifydate=Wed+Nov+14+09%3A28%3A16+EST+2012&literal.Document Info:Document Description=%3F&commitWithin=4000&wt=xml&version=2.2 at java.net.URI.create(Unknown Source) at org.apache.http.client.methods.HttpPost.<init>(HttpPost.java:76) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:286) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:797) Caused by: java.net.URISyntaxException: Illegal character in query at index 537: http://localhost:8080/solr/Lisa/update/extract?literal.id=https%3A%2F%2Fopentextdev2.llan.ll.mit.edu%2Fcs%2Fllisapi.dll%3Ffunc%3Dll%26objID%3D1016599%26objAction%3Ddownload&literal.allow_token_document=LISA-Authority-DEV%3A1005367&literal.allow_token_document=LISA-Authority-DEV%3A68276&literal.allow_token_document=LISA-Authority-DEV%3A796642&literal.allow_token_document=LISA-Authority-DEV%3AGUEST&literal.allow_token_document=LISA-Authority-DEV%3ASYSTEM&literal.deny_token_document=LISA-Authority-DEV%3ADEAD_AUTHORITY&literal.Document Info:Keyword / Phrase=%3F&literal.general_creator=th23825&literal.Document Info:Performing Organization=%3F&literal.general_description=&literal.general_modifier=th23825&literal.general_creationdate=Wed+Nov+14+09%3A28%3A16+EST+2012&literal.Document Info:Document Date=%3F&literal.Document Info:Document Author(s)=%3F&literal.general_name=%23raodoc4.txt%3E&literal.ll_filename=%23raodoc4.txt%3E&literal.general_owner=th23825&literal.Document Info:Document Revision Notes=%3F&literal.Document Info:Data Classification=For+Laboratory+Use+Only+%28FLUO%29&literal.general_modifydate=Wed+Nov+14+09%3A28%3A16+EST+2012&literal.Document Info:Document Description=%3F&commitWithin=4000&wt=xml&version=2.2 at java.net.URI$Parser.fail(Unknown Source) at java.net.URI$Parser.checkChars(Unknown Source) at java.net.URI$Parser.parseHierarchical(Unknown Source) at java.net.URI$Parser.parse(Unknown Source) at java.net.URI.<init>(Unknown Source) ... 6 more
Attachments
Issue Links
- relates to
-
SOLR-4358 SolrJ, by preventing multi-part post, loses key information about file name that Tika needs
- Closed
-
CONNECTORS-674 Send metadata to Solr using multipart request
- Reopened
-
CONNECTORS-956 Field names are URL encoded
- Resolved