Description
R formula drop the first category alphabetically when encode string/category feature. Spark RFormula use OneHotEncoder to encode string/category feature into vector, but only supporting "dropLast" by string/category frequencies. This will cause SparkR produce different models compared with native R.
Attachments
Issue Links
- Is contained by
-
SPARK-15540 RFormula and R feature processing improvement umbrella
- Resolved
- relates to
-
SPARK-14657 RFormula output wrong features when formula w/o intercept
- Resolved
- links to