Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
Jena 3.13.1
-
None
Description
A query like the following where some variables are optional may lead to wrong answers when spilling occurs:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
WHERE
{ ?x foaf:name ?name
OPTIONAL
{ ?x foaf:mbox ?mbox }
}
ORDER BY ASC(?mbox)
This is only a problem when the ARQ.spillToDiskThreshold setting has been configured.
The root cause is that BindingOutputStream emits a VARS row based on the first binding, but it doesn't emit a new VARS row when a subsequent binding contains additional variables.
The BindingOutputStream.needVars() method will cause a second VARS row to be emitted when a new binding is missing variables, but not when it has extras. This logic may be inverted from what was intended.
There's a TestDistinctDataBag test case below that reproduces the problem. It generates a spill file like this:
VARS ?1 . "A" . "A" .
when a correct spill file would be:
VARS ?1 . "A" . VARS ?2 ?1 . "B" "A" .
If you run it, you may notice that it fails with a spill threshold of 2 but passes with a higher threshold:
@Test public void testOptionalVariables() { // Setup a situation where the second binding in a spill file binds more // variables than the first binding BindingMap binding1 = BindingFactory.create(); binding1.add(Var.alloc("1"), NodeFactory.createLiteral("A")); BindingMap binding2 = BindingFactory.create(); binding2.add(Var.alloc("1"), NodeFactory.createLiteral("A")); binding2.add(Var.alloc("2"), NodeFactory.createLiteral("B")); List<Binding> undistinct = Arrays.asList(binding1, binding2, binding1); List<Binding> control = Iter.toList(Iter.distinct(undistinct.iterator())); List<Binding> distinct = new ArrayList<>(); DistinctDataBag<Binding> db = new DistinctDataBag<>( new ThresholdPolicyCount<Binding>(2), SerializationFactoryFinder.bindingSerializationFactory(), new BindingComparator(new ArrayList<SortCondition>())); try { db.addAll(undistinct); Iterator<Binding> iter = db.iterator(); while (iter.hasNext()) { distinct.add(iter.next()); } Iter.close(iter); } finally { db.close(); } assertEquals(control.size(), distinct.size()); assertTrue(ResultSetCompare.equalsByTest(control, distinct, NodeUtils.sameTerm)); }
Attachments
Issue Links
- links to