Description
Due to misplaced distinct FPGrowthModel.transform generates duplicated items in the "prediction":
scala> val data = spark.read.text("data/mllib/sample_fpgrowth.txt").select(split($"value", "\\s+").alias("features")) data: org.apache.spark.sql.DataFrame = [features: array<string>] scala> val data = spark.read.text("data/mllib/sample_fpgrowth.txt").select(split($"value", "\\s+").alias("features")) data: org.apache.spark.sql.DataFrame = [features: array<string>] scala> fpm.transform(Seq(Array("t", "s")).toDF("features")).show(1, false) +--------+---------------------+ |features|prediction | +--------+---------------------+ |[t, s] |[y, x, z, x, y, x, z]| +--------+---------------------+
Attachments
Issue Links
- is related to
-
SPARK-14503 spark.ml Scala API for FPGrowth
- Resolved
- links to