[JENA-2311] query rewrite index does too expensive caching on geo literals - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Done
Affects Version/s: Jena 4.4.0
Fix Version/s: Jena 4.6.0
Component/s: GeoSPARQL
Labels:
None

Description

Using a GeoSPARQL query with a geospatial property function, e.g.

SELECT * {
:x geo:hasGeometry ?geo1 .
?s2 geo:hasGeometry ?geo2 .
?geo1 geo:sfContains ?geo2
}

leads to heavy memory consumption for larger datasets - and we're not talking about big data at all. Imagine given a polygon and checking for millions of geometries for containment in the polygon.

In the QueryRewriteIndex class for caching a key will be generated, but this is horribly expensive given that the string representation of Geometries is called millions of times leading millions of Byte arrays being created leading a to a possible OOM exception - we got it with 8GB assigned.
The key generation for reference:

String key = subjectGeometryLiteral.getLiteralLexicalForm() + KEY_SEPARATOR + predicate.getURI() + KEY_SEPARATOR + objectGeometryLiteral.getLiteralLexicalForm();

My suggestion is to use a separate Node -> Integer (or Long?) Guava cache and use the long values instead to generate the cache key. Or any other more efficient datastructure, not even sure if a String is necessary?

We tried some fix which works for us and keeps the memory consumption stable:

 private LoadingCache<Node, Integer> nodeIDCache;
 private AtomicInteger cacheCounter;

...
        cacheCounter = new AtomicInteger(0);
        CacheBuilder<Object, Object> builder = CacheBuilder.newBuilder();
        if (maxSize > 0) {
            builder = builder.maximumSize(maxSize);
        }
        if (expiryInterval > 0) {
            builder = builder.expireAfterWrite(expiryInterval, TimeUnit.MILLISECONDS);
        }
        nodeIDCache = builder.build(
                        new CacheLoader<>() {
                            public Integer load(Node key) {
                                return cacheCounter.incrementAndGet();
                            }
                        });

Attachments

Issue Links

links to

GitHub Pull Request #1235

Activity

People

Assignee:: Andy Seaborne

Reporter:: Lorenz Bühmann

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 14/Mar/22 08:40

Updated:: 25/Aug/22 09:43

Resolved:: 26/Jul/22 09:36