Details
-
New Feature
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
None
-
None
Description
It could be incredibly slow for SGD methods to diverge or converge if their learning rate alpha are set inappropriately, many alternative methods have been proposed to produce desirable convergence with less dependence on hyperparameter settings, and to help prevent local optimum, e.g. Momentom, NAG (Nesterov's Accelerated Gradient), Adagrad, RMSProp etc.
Among which, Adam is one of the popular algorithms, which is for first-order gradient-based optimization of stochastic objective functions. It's proved to be well suited for problems with large data and/or parameters, and for problems with noisy and/or sparse gradients and is computationally efficient. Refer to this paper for details<https://arxiv.org/pdf/1412.6980v8.pdf>
In fact, Tensorflow has implemented most of the adaptive optimization methods mentioned, and we have seen that Adam out performs most of SGD methods in certain cases, such as very sparse dataset in a FM model.
It could be nice for Spark to have these adaptive optimization methods.
Attachments
Issue Links
- is blocked by
-
SPARK-17136 Design optimizer interface for ML algorithms
- Resolved
- links to