Step module
The step module tries to guess a descent direction and saves it in the state dictionary with the key 'direction'.
Classical steps
- GradientStep() only returns the opposite of the gradient
- NewtonStep() returns the traditional Newton step
The gradient step needs a line search, as well as the Newton one if the set of parameters is not close enough to the optimal set.
Conjugate gradient steps
- CWConjugateGradientStep() computes the Crowder-Wolfe conjugate gradient step
- DYConjugateGradientStep() computes the Dai-Yuan conjugate gradient step
- DConjugateGradientStep() computes the Dixon conjugate gradient step
- FRConjugateGradientStep() computes the Fletcher-Reeves conjugate gradient step
- PRPConjugateGradientStep() computes the Polak-Ribière-Polyak conjugate gradient step
- FRPRPConjugateGradientStep() computes the Polak-Ribière-Polyak conjugate gradient step with the Fletcher-Reeves modification
These conjugate gradient steps are generally used with the strong Wolfe-Powell rules, but the Polak-Ribière-Polyak algorithm can produce a direction that cannot be used with the Wolfe-Powell rules, although this step is more robust than the Fletcher-Reeves. This lead to FRPRPConjugateGradientStep.
The Wrowder-Wolfe gradient suffers from the same limitation than the PRP gradient, but the Dai-Yuan one should be OK.
Quasi-Newton steps
- MarquardtStep(gamma, c1, c2) defines the Marquardt step, adding a diagonal matrix ( the diagonal is made of gamma) to the hessian before the inversion. If the function is lowered after an iteration, the gamma parameter is multiplied by c1 whereas it is multiplied by c2 is the opposite case (the actual gamma is stored in the state dictionary)
Here again, the string Wolfe-Powell rules are needed, although the Goldstein one can be used as well.
Decorators
- PartialStep(step, nb_chunks, indice = None) decorates another step and splits the set of parameters in nb_chnuks chunks. If indice is not provided, a random indice is chosen
- RestartConjugateGradientStep(step, iteration_period = 10) restart a conjugate gradient procedure every iteration_period. The Polak-Ribière-Polyak gradient does not need this as it is included in the formula. For the other conjugate gradient, this can speed up convergence.
