Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-1771

Spilling combined with DISTINCT .. ORDER BY returns rows in the wrong order

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • Jena 3.13.1
    • Jena 3.14.0
    • ARQ
    • None

    Description

      It looks like Jena assumes that OpDistinct preserves order, but order is not preserved when spilling occurs. This is only a problem when the ARQ.spillToDiskThreshold setting has been configured.

      Consider the following query:

      PREFIX : <http://example.org/>
      SELECT DISTINCT  *
      WHERE
        { ?x  :p  ?v }
      ORDER BY ASC(?v)
      

      Here's the query plan for this query:

      (distinct
        (order ((asc ?v))
          (bgp (triple ?x <http://example.org/p> ?v))))
      

      Jena executes the ORDER BY ASC(?v) before the DISTINCT, relying on the SPARQL requirement:

      The order of Distinct(Ψ) must preserve any ordering given by OrderBy.

      But, when spilling, QueryIterDistinct (which executes OpDistinct) creates a DistinctDataBag with a BindingComparator without any sort conditions. As a result, the DISTINCT operation sorts using "compareBindingsSyntactic()" and doesn't preserve the ORDER BY ASC(?v) requirement.

      Note that some query plans will reorder the ORDER BY and DISTINCT, making things work correctly. For example, adding a LIMIT 5 clause to the query above results in a "(top (5 (asc ?v))" operation that doesn't suffer from the bug.

      You can reproduce this by injecting the following into QueryTest.java then running the ARQTestRefEngine tests:

      void runTestSelect(Query query, QueryExecution qe)
      {
          qe.getContext().set(ARQ.spillToDiskThreshold, 4);   // add this line
          ...
      

      For example, "ARQTestRefEngine -> Algebra optimizations -> QueryTest.opt-top-05" will fail with:

      Query: 
      PREFIX  :     <http://example/>
      
      SELECT DISTINCT  *
      WHERE
        { ?x  :p  ?v }
      ORDER BY ASC(?v)
      LIMIT   5
      
      Got: 5 --------------------------------
      -------------
      | x    | v  |
      =============
      | :x1  | 1  |
      | :x2  | 2  |
      | :x10 | 10 |
      | :x11 | 11 |
      | :x12 | 12 |
      -------------
      Expected: 5 -----------------------------
      ------------
      | x    | v |
      ============
      | :x1  | 1 |
      | :x2  | 2 |
      | :x3  | 3 |
      | :x3a | 3 |
      | :x4  | 4 |
      ------------
      
      junit.framework.AssertionFailedError: Results do not match
      	at junit.framework.Assert.fail(Assert.java:57)
      	at junit.framework.Assert.assertTrue(Assert.java:22)
      	at junit.framework.TestCase.assertTrue(TestCase.java:192)
      	at org.apache.jena.sparql.junit.QueryTest.runTestSelect(QueryTest.java:284)
      	at org.apache.jena.sparql.junit.QueryTest.runTestForReal(QueryTest.java:201)
      	at org.apache.jena.sparql.junit.EarlTestCase.runTest(EarlTestCase.java:88)
      	at junit.framework.TestCase.runBare(TestCase.java:141)
      	at junit.framework.TestResult$1.protect(TestResult.java:122)
      	at junit.framework.TestResult.runProtected(TestResult.java:142)
      	at junit.framework.TestResult.run(TestResult.java:125)
      	at junit.framework.TestCase.run(TestCase.java:129)
      

      Attachments

        Issue Links

          Activity

            People

              andy Andy Seaborne
              ssmith Shawn Smith
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m