How to clip gradient in Pytorch

This recipe helps you clip gradient in Pytorch
Last Updated: 26 Dec 2022

Get access to Data Science projects View all Data Science projects

DATA SCIENCE PROJECTS IN PYTHON DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective

How to clip gradient in Pytorch?

This is achieved by using the torch.nn.utils.clip_grad_norm_(parameters, max_norm, norm_type=2.0) syntax available in PyTorch, in this it will clip gradient norm of iterable parameters, where the norm is computed overall gradients together as if they were been concatenated into vector. There are functions being used in this which have there separate meanings:
parameters - It will have gradients normalized this is iterable of tensors or single tensor.
max_norm - this is nothing but the maximum normalization of the gradients.
norm_type - This is the normalization type or norm type which used p-norm. Also this can be "inf" for the infinity norm.

PyTorch vs Tensorflow - Which One Should You Choose For Your Next Deep Learning Project ?

Recipe Objective

Step 1 - Import library

import torch

Step 2 - Define parameters

batch, dim_in, dim_h, dim_out = 128, 2000, 200, 20

Here we are defining various parameters which are as follows:
batch - batch size
dim_in - Input dimension.
dim_out - Output dimension.
dim_h - hidden dimension.

Step 3 - Create Random tensors

input_X = torch.randn(batch, dim_in) output_Y = torch.randn(batch, dim_out)

Here we are creating random tensors for holding the input and output data.

Step 4 - Define model and loss function

Adam_model = torch.nn.Sequential( torch.nn.Linear(dim_in, dim_h), torch.nn.ReLU(), torch.nn.Linear(dim_h, dim_out), ) loss_fn = torch.nn.MSELoss(reduction='sum')

Step 5 - Define learning rate

rate_learning = 1e-4

Step 6 - Initialize optimizer

optim = torch.optim.Adam(SGD_model.parameters(), lr=rate_learning)

Here we are Initializing our optimizer by using the "optim" package which will update the weights of the model for us. We are using SGD optimizer here the "optim" package which consist of many optimization algorithms.

Step 7 - Forward pass

for values in range(500): pred_y = Adam_model(input_X) loss = loss_fn(pred_y, output_Y) if values % 100 == 99: print(values, loss.item())

99 698.3545532226562
199 698.3545532226562
299 698.3545532226562
399 698.3545532226562
499 698.3545532226562

Here we are computing the predicted y by passing input_X to the model, after that computing the loss and then printing it.

Step 8 - Zero all gradients

optim.zero_grad()

Here before the backward pass we must zero all the gradients for the variables it will update which are nothing but the learnable weights of the model.

Step 9 - Backward pass

loss.backward()

Here we are computing the gradients of the loss w.r.t the model parameters.

Step 10 - Call step function

optim.step()

Here we are calling the step function on an optimizer which will makes an update to its parameters.

Step 11 - Clip gradients

torch.nn.utils.clip_grad_norm(parameters=Adam_model.parameters(), max_norm=10, norm_type=2.0)

tensor(1462.1097)

What Users are saying..

Ed Godalle

Director Data Analytics at EY / EY Tech

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More