[MAHOUT-467] Change Iterable<Cooccurrence> in org.apache.mahout.math.hadoop.similarity.RowSimilarityJob.SimilarityReducer to list or array to improve the performance - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Not A Problem
Affects Version/s: 0.4
Fix Version/s: None
Component/s: None
Labels:
None

Description

In Class AbstractDistributedVectorSimilarity

protected int countElements(Iterator<?> iterator)
{ int count = 0;
while (iterator.hasNext())

{ count++; iterator.next(); }

return count;
}

The method countElements is used continually and is called continually ,but it has bad performance.

If the iterator has million elements ,we have to iterate million times to just get the count of the iterator.

this methods used in many pacles:
1) DistributedCooccurrenceVectorSimilarity

public class DistributedCooccurrenceVectorSimilarity extends AbstractDistributedVectorSimilarity {

@Override
protected double doComputeResult(int rowA, int rowB, Iterable<Cooccurrence> cooccurrences, double weightOfVectorA,
double weightOfVectorB, int numberOfColumns)

{ return countElements(cooccurrences); }

}

one items may be liked by many people, we has system ,one items may be liked by hundred thousand persons,
Here doComputeResult just returned the count of elements in cooccurrences,but It has to iterate for hundred thousand times.

If we use List or Array type,we can get the result in one call. because it already sets the size of the Array or list when system constructs the List or Array.

2) DistributedLoglikelihoodVectorSimilarity
3) DistributedTanimotoCoefficientVectorSimilarity

I have doing a test using DistributedCooccurrenceVectorSimilarity
it used 4.5 hours to run RowSimilarityJob-CooccurrencesMapper-SimilarityReducer

Attachments

Issue Links

duplicates

MAHOUT-460 Add "maxPreferencesPerItemConsidered" option to o.a.m.cf.taste.hadoop.similarity.item.ItemSimilarityJob

Closed

Activity

People

Assignee:: Sean R. Owen

Reporter:: Han Hui Wen

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 12/Aug/10 13:17

Updated:: 31/Jan/24 22:11

Resolved:: 22/Sep/10 07:12