Details
Description
The example Java code at the bottom of the mllib-decision-tree web page shows how to compute MSE on the test data. However, there is a bug in the code. The code currently divides by data.count(), but it should instead divide by the count of testData, testData.count().
http://spark.apache.org/docs/latest/mllib-decision-tree.html
Double testMSE =
predictionAndLabel.map(new Function<Tuple2<Double, Double>, Double>() {
@Override
public Double call(Tuple2<Double, Double> pl)
}).reduce(new Function2<Double, Double, Double>() {
@Override
public Double call(Double a, Double b)
}) / data.count();
System.out.println("Test Mean Squared Error: " + testMSE);