Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7439

Should FuzzyQuery match short terms too?

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 6.3, 7.0
    • None
    • None
    • New

    Description

      Today, if you ask FuzzyQuery to match abcd with edit distance 2, it will fail to match the term ab even though it's 2 edits away.

      Its javadocs explain this:

       * <p>NOTE: terms of length 1 or 2 will sometimes not match because of how the scaled
       * distance between two terms is computed.  For a term to match, the edit distance between
       * the terms must be less than the minimum length term (either the input term, or
       * the candidate term).  For example, FuzzyQuery on term "abcd" with maxEdits=2 will
       * not match an indexed term "ab", and FuzzyQuery on term "a" with maxEdits=2 will not
       * match an indexed term "abc".
      

      On the one hand, I can see that this behavior is sort of justified in that 50% of the characters are different and so this is a very "weak" match, but on the other hand, it's quite unexpected since edit distance is such an exact measure so the terms should have matched.

      It seems like the behavior is caused by internal implementation details about how the relative (floating point) score is computed. I think we should fix it, so that edit distance 2 does in fact match all terms with edit distance <= 2.

      Attachments

        1. LUCENE-7439.patch
          84 kB
          Michael McCandless
        2. LUCENE-7439.patch
          91 kB
          Michael McCandless
        3. LUCENE-7439.patch
          8 kB
          Michael McCandless

        Issue Links

          Activity

            People

              mikemccand Michael McCandless
              mikemccand Michael McCandless
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: