Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Invalid
-
7.7.2
-
None
Description
We write you from CEMEX Mexico, we use your solution through the HYBRIS E-Commerce from SAP, we have been with it for 3 years and we never had performance problems with it.
But since the end of March of this year when we have migrated from version 6.3 of Hybris to 1905, the one that brings with it also the change in version in solr from 6.1.0 to 7.7.2. We have found that when Hybris performs solr tasks like modifying an index or full index, the CPU usage climbs and saturates, causing the server to crash.
This was reported to the SAP people, who made us change the following configuration parameters without achieving significant changes on it:
(/etc/default/solr.in.sh)
SOLR_JAVA_MEM="-Xms8g -Xmx8g -XX:ConcGCThreads=2 -XX:ParallelGCThreads=2"
GC_TUNE="-XX:+UseG1GC -XX:+UnlockExperimentalVMOptions -XX:G1MaxNewSizePercent=70 -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=250 -XX:+UseLargePages -XX:+AlwaysPreTouch"
(solrconfig.xml)
<indexConfig>
<lockType>${solr.lock.type:native}</lockType>
<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
<int name="maxMergeCount">2</int>
<int name="maxThreadCount">1</int>
</mergeScheduler>
<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
<int name="maxMergeAtOnce">10</int>
<int name="segmentsPerTier">20</int>
</mergePolicyFactory>
<ramBufferSizeMB>600</ramBufferSizeMB>
</indexConfig>
This configuration changes made the server crash less often but it also made the indexation times much longer with a sustained high CPU usage. It is important to restate that no changes have been performed on our code regarding how indexation processes run, and this used to work quite well in the older solr version (6.1). (Tests and performance metrics can be found in the attached document named: SOLR TEST cliente pro SAP TUNNING - 12-05-2020.docx)
On the other hand, they tell us that they see a significant change in this class and I quote
"The methods that take most of the time are related to the Lucene70DocValuesConsumer class. You can find attached a PPT file with screenshots from Dynatrace and a stack trace from Solr.
I inspected the source code of the file (https://github.com/apache/lucene-solr/blob/branch_7_7/lucene/core/src/java/org/apache/lucene/codecs/lucene70/Lucene70DocValuesConsumer.java)
to see if it used any flags or configuration parameters that could be configured / tuned but that is not the case.
This part of the Solr code is very different from the old one (Solr 6.1). I did not have enough time to trace all the method calls to reach a conclusion, but it is definitively doing
things differently."
And they ask us to raise a ticket with you to see if they can help us see that it could have changed so much that it brings us the consumption problems mentioned above.
As it is the first time that we report a problem directly to you, we would like you to guide us in what we can pass on to you or how to take this problem to a prompt solution.
We remain at your entire disposal (and immediately) for what you need for your analysis.
Regards.