Description
*NOTE: This is targeted at 1.5.0 because it has so many useful links for JIRAs targeted for 1.5.0. In the future, we should create a new JIRA for linking future items.*
For new public APIs added to MLlib, we need to check the generated HTML doc and compare the Scala & Python versions. We need to track:
- Inconsistency: Do class/method/parameter names match?
SPARK-7667 - Docs: Is the Python doc missing or just a stub? We want the Python doc to be as complete as the Scala doc.
SPARK-7666,SPARK-6173 - API breaking changes: These should be very rare but are occasionally either necessary (intentional) or accidental. These must be recorded and added in the Migration Guide for this release.
SPARK-7665- Note: If the API change is for an Alpha/Experimental/DeveloperApi component, please note that as well.
- Missing classes/methods/parameters: We should create to-do JIRAs for functionality missing from Python.
- classification
- StreamingLogisticRegressionWithSGD
SPARK-7633
- StreamingLogisticRegressionWithSGD
- clustering
- GaussianMixture
SPARK-6258 - LDA
SPARK-6259 - Power Iteration Clustering
SPARK-5962 - StreamingKMeans
SPARK-4118
- GaussianMixture
- evaluation
- MultilabelMetrics
SPARK-6094
- MultilabelMetrics
- feature
- ElementwiseProduct
SPARK-7605 - PCA
SPARK-7604
- ElementwiseProduct
- linalg
- Distributed linear algebra
SPARK-6100
- Distributed linear algebra
- pmml.export
SPARK-7638 - regression
- StreamingLinearRegressionWithSGD
SPARK-4127
- StreamingLinearRegressionWithSGD
- stat
- KernelDensity
SPARK-7639
- KernelDensity
- util
- MLUtils
SPARK-6263
- MLUtils
- classification
Attachments
Issue Links
- contains
-
SPARK-4118 Create python bindings for Streaming KMeans
- Resolved
-
SPARK-4127 Streaming Linear Regression- Python bindings
- Resolved
-
SPARK-6094 Add MultilabelMetrics in PySpark/MLlib
- Resolved
-
SPARK-6258 Python MLlib API missing items: Clustering
- Resolved
-
SPARK-6263 Python MLlib API missing items: Utils
- Resolved
-
SPARK-7633 Streaming Logistic Regression- Python bindings
- Resolved
-
SPARK-5962 [MLLIB] Python support for Power Iteration Clustering
- Resolved
-
SPARK-7604 Python API for PCA and PCAModel
- Resolved
-
SPARK-7605 Python API for ElementwiseProduct
- Resolved
-
SPARK-7639 Add Python API for Statistics.kernelDensity
- Resolved
-
SPARK-7638 Python API for pmml.export
- Closed
-
SPARK-6259 Python API for LDA
- Resolved
-
SPARK-9122 spark.mllib regression should support batch predict
- Resolved
-
SPARK-8068 Add confusionMatrix method at class MulticlassMetrics in pyspark/mllib
- Resolved
-
SPARK-7667 MLlib Python API consistency check
- Resolved
-
SPARK-7203 Python API for local linear algebra
- Resolved
-
SPARK-3258 Python API for streaming MLlib algorithms
- Resolved
-
SPARK-5694 Python API for evaluation metrics
- Resolved
-
SPARK-6100 Distributed linear algebra in PySpark/MLlib
- Resolved
-
SPARK-6173 Python doc parity with Scala/Java in MLlib
- Resolved
-
SPARK-7665 MLlib Python API breaking changes check between 1.3 & 1.4
- Resolved
-
SPARK-7666 MLlib Python doc parity check
- Resolved
-
SPARK-8757 Check missing and add user guide for MLlib Python API
- Closed