Gradient Descent Optimization Machine Learning

Nadeem
5 min readMar 29, 2022

Gradient descent is an optimization algorithm that follows the negative gradient of an objective function in order to locate the minimum of the function.

paperspace blog

It is a simple and effective technique that can be implemented with just a few lines of code. It also provides the basis for many extensions and modifications that can result in better performance. The algorithm also provides the basis for the widely used extension called stochastic gradient descent, used to train deep learning neural networks.

you will discover how to implement gradient descent optimization from scratch.

Overview

Gradient Descent Algorithm

  • Gradient Descent: Minimization optimization that follows the negative of the gradient to the minimum of the target function.
  • Gradient Ascent: Maximization optimization that follows the gradient to the maximum of the target function.

The gradient descent algorithm requires a target function that is being optimized and the derivative function for the target function.

paperspace blog gifs

The target function f() returns a score for a given set of inputs, and the derivative function f’() gives the derivative of the target function for a given set of inputs.

  • Objective Function: Calculates a score for a given set of input parameters.
    Derivative Function: Calculates derivative (gradient) of the objective function for a given set of inputs.

The gradient descent algorithm requires a starting point (x) in the problem, such as a randomly selected point in the input space.

The derivative is then calculated and a step is taken in the input space that is expected to result in a downhill movement in the target function, assuming we are minimizing the target function.

A downhill movement is made by first calculating how far to move in the input space, calculated as the step size (called alpha or the learning rate) multiplied by the gradient. This is then subtracted from the current point, ensuring we move against the gradient, or down the target function.

  • x_new = x — alpha * f’(x)

The steeper the objective function at a given point, the larger the magnitude of the gradient, and in turn, the larger the step taken in the search space.

The size of the step taken is scaled using a step size hyperparameter.

  • Step Size (alpha): Hyperparameter that controls how far to move in the search space against the gradient each iteration of the algorithm.

If the step size is too small, the movement in the search space will be small and the search will take a long time. If the step size is too large, the search may bounce around the search space and skip over the optima.

Finding a good step size may take some trial and error for the specific target function.

The difficulty of choosing the step size can make finding the exact optima of the target function hard. Many extensions involve adapting the learning rate over time to take smaller steps or different sized steps in different dimensions and so on to allow the algorithm to hone in on the function optima.

The process of calculating the derivative of a point and calculating a new point in the input space is repeated until some stop condition is met. This might be a fixed number of steps or target function evaluations, a lack of improvement in target function evaluation over some number of iterations, or the identification of a flat (stationary) area of the search space signified by a gradient of zero.

  • Stop Condition: Decision on when to end the search procedure.

Let’s look at how we might implement the gradient descent algorithm in Python.

First, we can define an initial point as a randomly selected point in the input space defined by abounds.

# generate an initial pointsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])

We can then calculate the derivative of the point using a function named derivative().

# calculate gradientgradient = derivative(solution)

And take a step in the search space to a new point down the hill from the current point.

The new position is calculated using the calculated gradient and the step_size hyperparameter.

# take a stepsolution = solution - step_size * gradient

We can then evaluate this point and report the performance.

# evaluate candidate pointsolution_eval = objective(solution)

This process can be repeated for a fixed number of iterations controlled via a n_iter hyperparameter.

# run the gradient descentfor i in range(n_iter):# calculate gradientgradient = derivative(solution)# take a stepsolution = solution - step_size * gradient# evaluate candidate pointsolution_eval = objective(solution)# report progressprint('>%d f(%s) = %.5f' % (i, solution, solution_eval))

We can tie all of this together into a function named gradient_descent().

The function takes the name of the objective and gradient functions, as well as the bounds on the inputs to the objective function, the number of iterations, and step size, then returns the solution and its evaluation at the end of the search.

The complete gradient descent optimization algorithm implemented as a function is listed below.

# gradient descent algorithmdef gradient_descent(objective, derivative, bounds, n_iter, step_size):# generate an initial pointsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])# run the gradient descentfor i in range(n_iter):# calculate gradientgradient = derivative(solution)# take a stepsolution = solution - step_size * gradient# evaluate candidate pointsolution_eval = objective(solution)# report progressprint('>%d f(%s) = %.5f' % (i, solution, solution_eval))return [solution, solution_eval]

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Books

APIs

Articles

Summary

In this blog, you discovered how to implement gradient descent optimization from scratch.

Specifically, you learned:

  • Gradient descent is a general procedure for optimizing a differentiable objective function.
  • How to implement the gradient descent algorithm from scratch in Python.
  • How to apply the gradient descent algorithm to an objective function.

Got questions? Need help? Contact me!

Email: nadeemoffl@gmail.com

LinkedIn: https://www.linkedin.com/in/nadeemoffl/

Twitter: https://twitter.com/Nadeemoffll

Instagram: https://www.instagram.com/nadeemoffll/

--

--