[SOLR-17394] IndexFetcher should inspect HTTP status codes on its requests - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: main (10.0), 9.6.1
Fix Version/s: 9.7
Component/s: replication (java)
Labels:
- pull-request-available

Description

Typically, SolrJ will look at the HTTP status code of a response and it will throw exceptions as appropriate (see here). But it skips this logic if users have elected to parse the response themselves by use of an "InputStreamResponseParser".

Solr's IndexFetcher uses this "InputStreamResponseParser" so that it can access the binary index data in the HTTP response. But it doesn't check the status code of responses as it should.

IndexFetcher will typically notice that the response is unexpected and can retry and ultimately succeed, but this happens relatively late in the process. And that delay can be very expensive in many cases. For instance:

When IndexFetcher gets a "filecontent" response, it expects the first few bytes to indicate the size of the binary response. So it reads these bytes and instantiates a byte-array of the indicated size. But if IndexFetcher happens to be reading a 404 response, the first few bytes of the response will be the '<', 'h', 'e', and 'a' characters from the "<head>" tag that Solr uses to begin all its HTML errors. This leads to IndexFetcher allocating a massive > 1GB byte-array! This can cause GC churn in production and (for me at least) was causing test runs to frequently OOM on certain machines.

We should have IndexFetcher (and other places that use InputStreamResponseParser) check the response status code as soon as its available and handle errors accordingly.

Attachments

Issue Links

links to

GitHub Pull Request #2621

Activity

People

Assignee:: Jason Gerlowski

Reporter:: Jason Gerlowski

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 06/Aug/24 14:06

Updated:: 25/Sep/24 19:28

Resolved:: 08/Aug/24 12:55

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 20m