Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
5.5.5, 6.6.5, 7.6, 8.0
-
New
Description
Recently in our production, we found that Solr uses a lot of memory(more than 10g) during recovery or commit for a small index (3.5gb)
The stack trace is:
Thread 0x4d4b115c0 at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V (SegmentTermsEnumFrame.java:157) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:786) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:538) at org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnum.java:757) at org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (FilterLeafReader.java:185) at org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z (TermsEnum.java:74) at org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J (SolrIndexSearcher.java:823) at org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (VersionInfo.java:204) at org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (UpdateLog.java:786) at org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (VersionInfo.java:194) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z (DistributedUpdateProcessor.java:1051)
We reproduced the problem locally with the following code using Lucene code.
public static void main(String[] args) throws IOException { FSDirectory index = FSDirectory.open(Paths.get("the-index")); try (IndexReader reader = new ExitableDirectoryReader(DirectoryReader.open(index), new QueryTimeoutImpl(1000 * 60 * 5))) { String id = "the-id"; BytesRef text = new BytesRef(id); for (LeafReaderContext lf : reader.leaves()) { TermsEnum te = lf.reader().terms("id").iterator(); System.out.println(te.seekExact(text)); } } }
I added System.out.println("ord: " + ord); in codecs.blocktree.SegmentTermsEnum.getFrame(int).
Please check the attached output of test program.txt.
We found out the root cause:
we didn't implement seekExact(BytesRef) method in FilterLeafReader.FilterTerms, so it uses the base class TermsEnum.seekExact(BytesRef) implementation which is very inefficient in this case.
public boolean seekExact(BytesRef text) throws IOException { return seekCeil(text) == SeekStatus.FOUND; }
The fix is simple, just override seekExact(BytesRef) method in FilterLeafReader.FilterTerms
@Override public boolean seekExact(BytesRef text) throws IOException { return in.seekExact(text); }
Attachments
Attachments
Issue Links
- is related to
-
LUCENE-8292 Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods
- Closed
- links to