Uploaded image for project: 'Commons Collections'
  1. Commons Collections
  2. COLLECTIONS-547

Performance issue in SetUtils::isEqualSet

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Not A Problem
    • 4.0
    • None
    • Set
    • Ubuntu 14.04

    Description

      There seems to be a performance problem with the function isEqualSet of
      SetUtils when the first parameter is of a collection type that has a slow containsAll/contains method.

      The following is the code of the function:

          /**
           * Tests two sets for equality as per the <code>equals()</code> contract
           * in {@link java.util.Set#equals(java.lang.Object)}.
           * <p>
           * This method is useful for implementing <code>Set</code> when you cannot
           * extend AbstractSet. The method takes Collection instances to enable other
           * collection types to use the Set implementation algorithm.
           * <p>
           * The relevant text (slightly paraphrased as this is a static method) is:
           * <blockquote>
           * <p>Two sets are considered equal if they have
           * the same size, and every member of the first set is contained in
           * the second. This ensures that the <tt>equals</tt> method works
           * properly across different implementations of the <tt>Set</tt>
           * interface.</p>
           *
           * <p>
           * This implementation first checks if the two sets are the same object:
           * if so it returns <tt>true</tt>.  Then, it checks if the two sets are
           * identical in size; if not, it returns false. If so, it returns
           * <tt>a.containsAll((Collection) b)</tt>.</p>
           * </blockquote>
           *
           * @see java.util.Set
           * @param set1  the first set, may be null
           * @param set2  the second set, may be null
           * @return whether the sets are equal by value comparison
           */
          public static boolean isEqualSet(final Collection<?> set1, final Collection<?> set2) {
              if (set1 == set2) {
                  return true;
              }
              if (set1 == null || set2 == null || set1.size() != set2.size()) {
                  return false;
              }
      
              return set1.containsAll(set2);
          }
      
      

      The problem is that in the last return statement, the function relies on the
      containsAll method of the class of the set1, which can be any type of collection.

      The following test harness shows the performance degradation between
      using a list and using a set as a first parameter, across different collection sizes.

          public static void 	setUtilsisEqualSetTest(boolean original) {
      
      	int N=500000;
      	ArrayList<Integer> firstArrayList=new ArrayList<Integer>();
      	ArrayList<Integer> secondArrayList=new ArrayList<Integer>();
      	for(int i=0;i<N;++i) {
      
      	    firstArrayList.add(new Integer(i));
      	    secondArrayList.add(new Integer((N-1)-i));
      	    
      	}
      
      	SetUtils.isEqualSet(original?firstArrayList:(new HashSet<Integer>(firstArrayList)),secondArrayList);
      	
      

      In the following table "Original" corresponds to the time taken by
      the existing implementation of SetUtils::isEqualSet, "Repaired" to the time
      taken by the function invoked with the first parameter converted into a
      set, and "Speed-up" to the speed-up obtained after the repair.

      N Original(s) Repaired(s) Speed-up(X)

      10 1.01 1.02 0.99
      100 1.02 1.02 1
      1000 1.04 1.04 1
      10000 1.15 1.09 1.05
      100000 9.33 1.11 8.40
      500000 > 300 1.26 > 238.09

      One way to deal with this would be to call the CollectionUtils::containsAll method instead of Collection::containsAll, since it has a linear time implementation instead of quadratic, and can handle the types of isEqualSet.

      Another solution would be to include a warning to the user in the documentation that the first parameter should have a fast containment method when possible.

      Attachments

        Activity

          People

            Unassigned Unassigned
            oswaldo_o Oswaldo Olivo
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: