How to create seq2seq modelling models in pytorch Also Explain encoder and decoder.

This recipe helps you create seq2seq modelling models in pytorch Also This recipe explains what encoder and decoder.

Recipe Objective

How to create seq2seq modelling models in pytorch?

Also explain what is encoder and decoder. Here in this code we are going to study the encoder, decoder and sequence to sequence modelling, lets start with encoder. Encoder The encoder in PyTorch in sequence to sequence modelling is nothing but an RNN i.e Recurrent neural network which will outputs some value from the every input word or word from the input sequence. The encoder here will outputs a vector and a hidden state and for the next input word uses the hidden state for every input word in the sequence. Decoder The decoder which will take the encoder output vector and outputs a sequence of the words for creating the translation. The last output from the encoder is used by the decoder in simple sequence to sequence decoder, the last output might be sometimes called as a context vector. The context vector is being used as a hidden state of the decoder. At every step of decoding, the decoder is given an input token and hidden state. Sequence to Sequence The Sequence to Sequence model is nothing but a recurrent neural network or RNN which is operating on a sequence and using its own output as the input for the subsequent stpes. The Sequence to Sequence network is also called as Encoder Decoder network which is a model of 2 recurrent neural networks which are called the encoder and the decoder. The encoder is reading the input sequence and then giving output of a single vector, and the decoder reads that vector generated by encoder to produce the output sequence.

Learn to use RNN for Text Classification with Source Code

Installing the required package torch text

!pip install torchtext

Step 1 - Import libraries

import torch
import torch.nn as nn
import torch.optim as optim
from torchtext.datasets import Multi30k
from torchtext.data import Field, BucketIterator
import numpy as np
import spacy
import random
from torch.utils.tensorboard import SummaryWriter
from torchtext.data.metrics import bleu_score
import sys
import warnings warnings.filterwarnings("ignore")

Step 2 -Install and Load tokenizer

!python -m spacy download en
!python -m spacy download de
spacy_german = spacy.load('de') ## load german tokenizer
spacy_english = spacy.load('en') ## load english tokenizer

Step 3 - Define german tokenizer

def german_token(text):
   return [token.text for token in spacy_german.tokenizer(text)]

Step 4 - Define English tokenizer

def english_token(text):
   return [token.text for token in spacy_english.tokenizer(text)]

Step 5 - Field for german and English tokenizer

german_sentence = Field(tokenize=german_token, lower=True, init_token="", eos_token="")
english_sentence = Field(tokenize=english_token, lower=True, init_token="", eos_token="")

Step 6 - Train and test set

train_set, validation_set, test_set = Multi30k.splits(exts=('.de','.en'), fields=(german_sentence, english_sentence))

Step 7 - Build vocabulary

german_sentence.build_vocab(train_set, max_size= 10000, min_freq = 2)
english_sentence.build_vocab(train_set, max_size= 10000, min_freq = 2)

Step 8 - Encoder class

class Encoder(nn.Module):
   def __init__(self, size_input, size_embedding, size_hidden, number_layers, dop):
     super(Encoder, self ).__init__()
     self.hidden_size = size_hidden
     self.num_layers = number_layers
     self.dropout = nn.Dropout(dop)
     self.embedding = nn.Embedding(size_input, size_embedding)
     self.rnn = nn.LSTM(size_embedding, size_hidden, number_layers, dropout=dop)
def forward(self, x):
   ##shape of the x is the (seq_length, N) where N is the batch size
   embedding = self.dropout(self.embedding(x))
   ##shape of the embedding is the (seq_length, N) where N is the batch size
   outputs, (hidden, cell) = self.rnn(embedding)
   return hidden, cell

Step 9 - Decoder class

class Decoder(nn.Module):
   def __init__(self, size_input, size_embedding, size_hidden, size_output, number_layers, dop):
      super(Decoder, self).__init__()
      self.hidder_size = size_hidden
      self.num_layers = number_layers
      self.dropout = nn.Dropout(dop)
      self.embedding = nn.Embedding(size_input, size_embedding)
      self.rnn = nn.LSTM(size_embedding, size_hidden, number_layers, dropout=dop)
      self.fc = nn.Linear(size_hidden, size_output)
def forward(self, x, hidden, cell):
   ##zshape of the x is (N) but we required it to be (1, N) where 1 represents a single word and N is the batch size
   x = x.unsqueeze(0)
   embedding = self.dropout(self.embedding(x))
   ##embedding shape is (1, N, size_embedding)
   outputs, (hidden, cell) = self.rnn(embedding, (hidden, cell))
   ##shape of the output is the (1, N, size_hidden)
   our_predictions = self.fc(outputs)
   ##shape of the predictions is (1, N, length of vocab)
   our_predictions = our_predictions.squeeze(0)
   return our_predictions, hidden, cell

Step 10 - Sequence 2 Sequence Model class

class Seq2Seq(nn.Module):
   def __init__(self, encoder, decoder):
      super(Seq2Seq, self).__init__()
      self.encoder = encoder
      self.decoder = decoder
   def forward(self, source, target, teacher_force_ratio = 0.5):
      batcg_size = source.shape[1]
      target_length = target.shape[0]
      target_vocab_size = len(english_sentence.vocab)
      outputs = torch.zeros(target_length, batch_size, target_vocab_size).to(device)
      hidden, cell = self.encoder(source)
      x = target[0]
      for value in range(1, target_length):
         output, hidden, cell = self.decoder(x, hidden, cell)
         outputs[value] = output
         guess_best = output.argmax(1)
         x = target[value] if random.random() < teacher_force_ratio else guess_best
      return outputs

Step 11 - Start Training - Training hyperparameters

epochs = 20 ##number of epochs
rate_learning = 0.001 ## learning rate
batch_size = 64

Step 12 - Model hyperparameters

load_model = False
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
input_size_encoder = len(german_sentence.vocab)
input_size_decoder = len(english_sentence.vocab)
output_size = len(english_sentence.vocab)
encoder_embedding_size = 300
decoder_embedding_size = 300
hidden_size = 1024
num_layers = 2
encoder_dropout = 0.5

Step 13 - Tensorboard - SummaryWriter

summary_writer = SummaryWriter(f'runs/loss_plot')
step = 0

Step 14 - Iterators

iterator_train, iterator_validation, iterator_test = BucketIterator.splits((train_set, validation_set, test_set),                                                                                               batch_size=batch_size,
                                                                             sort_within_batch = True,
                                                                             sort_key = lambda x: len(x.src),
                                                                              d

Step 15 - Define translate sentence

def translate_sentence(model, sentence, german, english, device, max_length=50):
   spacy_german = spacy.load("de")
   # Create tokens using spacy and everything in lower case (which is what our vocab is)
   if type(sentence) == str:
       tokens = [token.text.lower() for token in spacy_german(sentence)]
   else:
       tokens = [token.lower() for token in sentence]
   tokens.insert(0, german.init_token)
   tokens.append(german.eos_token)
   text_to_indices = [german.vocab.stoi[token] for token in tokens]
   sentence_tensor = torch.LongTensor(text_to_indices).unsqueeze(1).to(device)
   with torch.no_grad():
       hidden, cell = model.encoder(sentence_tensor)
   outputs = [english.vocab.stoi[""]]
   for _ in range(max_length):
      previous_word = torch.LongTensor([outputs[-1]]).to(device)
      with torch.no_grad():
          output, hidden, cell = model.decoder(previous_word, hidden, cell)
          best_guess = output.argmax(1).item()
          outputs.append(best_guess)
          if output.argmax(1).item() == english.vocab.stoi[""]:
              break
   translated_sentence = [english.vocab.itos[idx] for idx in outputs]
   return translated_sentence[1:]

Step 16 - Define Bleu

def bleu(data, model, german, english, device):
    targets = []
    outputs = []
    for example in data:
      src = vars(example)["src"]
      trg = vars(example)["trg"]
      prediction = translate_sentence(model, src, german, english, device)
      prediction = prediction[:-1] # remove token
      targets.append([trg])
      outputs.append(prediction)
    return bleu_score(outputs, targets)

Step 17 - Define Checkpoint

def save_checkpoint(state, filename="my_checkpoint.pth.tar"):
    print("=> Saving checkpoint")
    torch.save(state, filename)
def load_checkpoint(checkpoint, model, optimizer):
    print("=> Loading checkpoint")
    model.load_state_dict(checkpoint["state_dict"])
    optimizer.load_state_dict(checkpoint["optimizer"])

Step 18 - Model

model = Seq2Seq(net_encoder, net_decoder).to(device)
optimizer = optim.Adam(model.parameters(), lr = rate_learning)
padding_index = english_sentence.vocab.stoi['']
criterion = nn.CrossEntropyLoss(ignore_index=padding_index)
if load_model:
  load_checkpoint(torch.load('my_checkpoint.pth.ptar'), model, optimizer)
sentence = "Hallo, du lernst Pytorch Sequenz zu Sequenz Modlling"
for epoch in range(epochs):
   print(f'Epoch [{epoch} / {epochs}]')
   checkpoint = {'state_dict':model.state_dict(),'optimizer':optimizer.state_dict()}
   save_checkpoint = save_checkpoint(checkpoint)
   model.eval()
   translate_sentence = translate_sentence(model, sentence, german_sentence, english_sentence, device, max_length = 50)
   for batch_index, batch in enumerate(iterator_train):
      input_data = batch.src.to(device)
      target = batch.trg.to(device)
      output = model(input_data, target)
      ## the output shape is going to be the (target_length, batch_size, output_dimension)
      output = output[1:].reshape(-1, output.shape[2])
      target = target[1:].reshape(-1)
      optimizer.zero_grad()
      loss = criterion(output, target)
      loss.backward()
      torch.nn.utils.clip_grad_norm(model.parameters(), max_norm=1)
      optimizer.step()
      summary_writer.add_scalar('training loss', loss, global_step=step)
      step += 1

What Users are saying..

profile image

Ray han

Tech Leader | Stanford / Yale University
linkedin profile url

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Build a Autoregressive and Moving Average Time Series Model
In this time series project, you will learn to build Autoregressive and Moving Average Time Series Models to forecast future readings, optimize performance, and harness the power of predictive analytics for sensor data.

AWS Project to Build and Deploy LSTM Model with Sagemaker
In this AWS Sagemaker Project, you will learn to build a LSTM model on Sagemaker for sales forecasting while analyzing the impact of weather conditions on Sales.

Isolation Forest Model and LOF for Anomaly Detection in Python
Credit Card Fraud Detection Project - Build an Isolation Forest Model and Local Outlier Factor (LOF) in Python to identify fraudulent credit card transactions.

Create Your First Chatbot with RASA NLU Model and Python
Learn the basic aspects of chatbot development and open source conversational AI RASA to create a simple AI powered chatbot on your own.

Build CNN Image Classification Models for Real Time Prediction
Image Classification Project to build a CNN model in Python that can classify images into social security cards, driving licenses, and other key identity information.

Time Series Project to Build a Multiple Linear Regression Model
Learn to build a Multiple linear regression model in Python on Time Series Data

ML Model Deployment on AWS for Customer Churn Prediction
MLOps Project-Deploy Machine Learning Model to Production Python on AWS for Customer Churn Prediction

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

AWS MLOps Project to Deploy a Classification Model [Banking]
In this AWS MLOps project, you will learn how to deploy a classification model using Flask on AWS.

Image Segmentation using Mask R-CNN with Tensorflow
In this Deep Learning Project on Image Segmentation Python, you will learn how to implement the Mask R-CNN model for early fire detection.