Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-2311

query rewrite index does too expensive caching on geo literals

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Done
    • Jena 4.4.0
    • Jena 4.6.0
    • GeoSPARQL
    • None

    Description

      Using a GeoSPARQL query with a geospatial property function, e.g.

      SELECT * {
      :x geo:hasGeometry ?geo1 .
      ?s2 geo:hasGeometry ?geo2 .
      ?geo1 geo:sfContains ?geo2
      }
      

      leads to heavy memory consumption for larger datasets - and we're not talking about big data at all. Imagine given a polygon and checking for millions of geometries for containment in the polygon.

      In the QueryRewriteIndex class for caching a key will be generated, but this is horribly expensive given that the string representation of Geometries is called millions of times leading millions of Byte arrays being created leading a to a possible OOM exception - we got it with 8GB assigned.
      The key generation for reference:

      String key = subjectGeometryLiteral.getLiteralLexicalForm() + KEY_SEPARATOR + predicate.getURI() + KEY_SEPARATOR + objectGeometryLiteral.getLiteralLexicalForm();
      

      My suggestion is to use a separate Node -> Integer (or Long?) Guava cache and use the long values instead to generate the cache key. Or any other more efficient datastructure, not even sure if a String is necessary?

      We tried some fix which works for us and keeps the memory consumption stable:

       private LoadingCache<Node, Integer> nodeIDCache;
       private AtomicInteger cacheCounter;
      
      ...
              cacheCounter = new AtomicInteger(0);
              CacheBuilder<Object, Object> builder = CacheBuilder.newBuilder();
              if (maxSize > 0) {
                  builder = builder.maximumSize(maxSize);
              }
              if (expiryInterval > 0) {
                  builder = builder.expireAfterWrite(expiryInterval, TimeUnit.MILLISECONDS);
              }
              nodeIDCache = builder.build(
                              new CacheLoader<>() {
                                  public Integer load(Node key) {
                                      return cacheCounter.incrementAndGet();
                                  }
                              });
      

      Attachments

        Issue Links

          Activity

            People

              andy Andy Seaborne
              LorenzB Lorenz Bühmann
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: