Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
1.6, 1.9
-
None
Description
In a discussion part of TEXT-126, it was pointed that the Jaccard similarity returns 0.0, and the distance 1.0. While in other libraries it returns the opposite for each.
package br.eti.kinoshita.tests.text; import java.util.Collections; public class EditDistances { public static void main(String[] args) { System.out.println("Testing jaccard sim/dis with empty strings"); System.out.println("---"); org.simmetrics.metrics.Jaccard<String> j1 = new org.simmetrics.metrics.Jaccard<>(); float s1 = j1.compare(Collections.emptySet(), Collections.emptySet()); System.out.println("Simmetrics Jaccard similarity: " + s1); float d1 = j1.distance(Collections.emptySet(), Collections.emptySet()); System.out.println("Simmetrics Jaccard distance: " + d1); System.out.println("---"); info.debatty.java.stringsimilarity.Jaccard j2 = new info.debatty.java.stringsimilarity.Jaccard(); double s2 = j2.similarity("", ""); System.out.println("javastringsimilarity Jaccard similarity: " + s2); double d2 = j2.distance("", ""); System.out.println("javastringsimilarity Jaccard distance: " + d2); System.out.println("---"); org.apache.commons.text.similarity.JaccardSimilarity j3_1 = new org.apache.commons.text.similarity.JaccardSimilarity(); double s3 = j3_1.apply("", ""); System.out.println("commons-text Jaccard similarity: " + s3); org.apache.commons.text.similarity.JaccardDistance j3_2 = new org.apache.commons.text.similarity.JaccardDistance(); double d3 = j3_2.apply("", ""); System.out.println("commons-text Jaccard distance: " + d3); } }
Produces:
Testing jaccard sim/dis with empty strings --- Simmetrics Jaccard similarity: 1.0 Simmetrics Jaccard distance: 0.0 --- javastringsimilarity Jaccard similarity: 1.0 javastringsimilarity Jaccard distance: 0.0 --- commons-text Jaccard similarity: 0.0 commons-text Jaccard distance: 1.0
We need to confirm what's the correct output for similarity and distance with empty strings. And either document why we are returning what we are returning, or fix it as a bug for the next release.
Attachments
Issue Links
- is related to
-
TEXT-126 Dice's Coefficient Algorithm in String similarity
- Open
- links to