How to create seq2seq modelling models in pytorch Also Explain encoder and decoder.

This recipe helps you create seq2seq modelling models in pytorch Also This recipe explains what encoder and decoder.
Last Updated: 08 Aug 2022

Get access to Data Science projects View all Data Science projects

DATA SCIENCE PROJECTS IN PYTHON DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective

How to create seq2seq modelling models in pytorch?

Also explain what is encoder and decoder. Here in this code we are going to study the encoder, decoder and sequence to sequence modelling, lets start with encoder. Encoder The encoder in PyTorch in sequence to sequence modelling is nothing but an RNN i.e Recurrent neural network which will outputs some value from the every input word or word from the input sequence. The encoder here will outputs a vector and a hidden state and for the next input word uses the hidden state for every input word in the sequence. Decoder The decoder which will take the encoder output vector and outputs a sequence of the words for creating the translation. The last output from the encoder is used by the decoder in simple sequence to sequence decoder, the last output might be sometimes called as a context vector. The context vector is being used as a hidden state of the decoder. At every step of decoding, the decoder is given an input token and hidden state. Sequence to Sequence The Sequence to Sequence model is nothing but a recurrent neural network or RNN which is operating on a sequence and using its own output as the input for the subsequent stpes. The Sequence to Sequence network is also called as Encoder Decoder network which is a model of 2 recurrent neural networks which are called the encoder and the decoder. The encoder is reading the input sequence and then giving output of a single vector, and the decoder reads that vector generated by encoder to produce the output sequence.

Learn to use RNN for Text Classification with Source Code

Installing the required package torch text

!pip install torchtext

Step 1 - Import libraries

import torch import torch.nn as nn import torch.optim as optim from torchtext.datasets import Multi30k from torchtext.data import Field, BucketIterator import numpy as np import spacy import random from torch.utils.tensorboard import SummaryWriter from torchtext.data.metrics import bleu_score import sys import warnings warnings.filterwarnings("ignore")

Step 2 -Install and Load tokenizer

!python -m spacy download en !python -m spacy download de spacy_german = spacy.load('de') ## load german tokenizer spacy_english = spacy.load('en') ## load english tokenizer

Step 3 - Define german tokenizer

def german_token(text): return [token.text for token in spacy_german.tokenizer(text)]

Step 4 - Define English tokenizer

def english_token(text): return [token.text for token in spacy_english.tokenizer(text)]

Step 5 - Field for german and English tokenizer

german_sentence = Field(tokenize=german_token, lower=True, init_token="", eos_token="") english_sentence = Field(tokenize=english_token, lower=True, init_token="", eos_token="")

Step 6 - Train and test set

train_set, validation_set, test_set = Multi30k.splits(exts=('.de','.en'), fields=(german_sentence, english_sentence))

Step 7 - Build vocabulary

german_sentence.build_vocab(train_set, max_size= 10000, min_freq = 2) english_sentence.build_vocab(train_set, max_size= 10000, min_freq = 2)

Step 8 - Encoder class

class Encoder(nn.Module): def __init__(self, size_input, size_embedding, size_hidden, number_layers, dop): super(Encoder, self ).__init__() self.hidden_size = size_hidden self.num_layers = number_layers self.dropout = nn.Dropout(dop) self.embedding = nn.Embedding(size_input, size_embedding) self.rnn = nn.LSTM(size_embedding, size_hidden, number_layers, dropout=dop) def forward(self, x): ##shape of the x is the (seq_length, N) where N is the batch size embedding = self.dropout(self.embedding(x)) ##shape of the embedding is the (seq_length, N) where N is the batch size outputs, (hidden, cell) = self.rnn(embedding) return hidden, cell

Step 9 - Decoder class

class Decoder(nn.Module): def __init__(self, size_input, size_embedding, size_hidden, size_output, number_layers, dop): super(Decoder, self).__init__() self.hidder_size = size_hidden self.num_layers = number_layers self.dropout = nn.Dropout(dop) self.embedding = nn.Embedding(size_input, size_embedding) self.rnn = nn.LSTM(size_embedding, size_hidden, number_layers, dropout=dop) self.fc = nn.Linear(size_hidden, size_output) def forward(self, x, hidden, cell): ##zshape of the x is (N) but we required it to be (1, N) where 1 represents a single word and N is the batch size x = x.unsqueeze(0) embedding = self.dropout(self.embedding(x)) ##embedding shape is (1, N, size_embedding) outputs, (hidden, cell) = self.rnn(embedding, (hidden, cell)) ##shape of the output is the (1, N, size_hidden) our_predictions = self.fc(outputs) ##shape of the predictions is (1, N, length of vocab) our_predictions = our_predictions.squeeze(0) return our_predictions, hidden, cell

Step 10 - Sequence 2 Sequence Model class

class Seq2Seq(nn.Module): def __init__(self, encoder, decoder): super(Seq2Seq, self).__init__() self.encoder = encoder self.decoder = decoder def forward(self, source, target, teacher_force_ratio = 0.5): batcg_size = source.shape[1] target_length = target.shape[0] target_vocab_size = len(english_sentence.vocab) outputs = torch.zeros(target_length, batch_size, target_vocab_size).to(device) hidden, cell = self.encoder(source) x = target[0] for value in range(1, target_length): output, hidden, cell = self.decoder(x, hidden, cell) outputs[value] = output guess_best = output.argmax(1) x = target[value] if random.random() < teacher_force_ratio else guess_best return outputs

Step 11 - Start Training - Training hyperparameters

epochs = 20 ##number of epochs rate_learning = 0.001 ## learning rate batch_size = 64

Step 12 - Model hyperparameters

load_model = False device = torch.device("cuda" if torch.cuda.is_available() else "cpu") input_size_encoder = len(german_sentence.vocab) input_size_decoder = len(english_sentence.vocab) output_size = len(english_sentence.vocab) encoder_embedding_size = 300 decoder_embedding_size = 300 hidden_size = 1024 num_layers = 2 encoder_dropout = 0.5

Step 13 - Tensorboard - SummaryWriter

summary_writer = SummaryWriter(f'runs/loss_plot') step = 0

Step 14 - Iterators

iterator_train, iterator_validation, iterator_test = BucketIterator.splits((train_set, validation_set, test_set), batch_size=batch_size, sort_within_batch = True, sort_key = lambda x: len(x.src), d

Step 15 - Define translate sentence

def translate_sentence(model, sentence, german, english, device, max_length=50): spacy_german = spacy.load("de") # Create tokens using spacy and everything in lower case (which is what our vocab is) if type(sentence) == str: tokens = [token.text.lower() for token in spacy_german(sentence)] else: tokens = [token.lower() for token in sentence] tokens.insert(0, german.init_token) tokens.append(german.eos_token) text_to_indices = [german.vocab.stoi[token] for token in tokens] sentence_tensor = torch.LongTensor(text_to_indices).unsqueeze(1).to(device) with torch.no_grad(): hidden, cell = model.encoder(sentence_tensor) outputs = [english.vocab.stoi[""]] for _ in range(max_length): previous_word = torch.LongTensor([outputs[-1]]).to(device) with torch.no_grad(): output, hidden, cell = model.decoder(previous_word, hidden, cell) best_guess = output.argmax(1).item() outputs.append(best_guess) if output.argmax(1).item() == english.vocab.stoi[""]: break translated_sentence = [english.vocab.itos[idx] for idx in outputs] return translated_sentence[1:]

Step 16 - Define Bleu

def bleu(data, model, german, english, device): targets = [] outputs = [] for example in data: src = vars(example)["src"] trg = vars(example)["trg"] prediction = translate_sentence(model, src, german, english, device) prediction = prediction[:-1] # remove token targets.append([trg]) outputs.append(prediction) return bleu_score(outputs, targets)

Step 17 - Define Checkpoint

def save_checkpoint(state, filename="my_checkpoint.pth.tar"): print("=> Saving checkpoint") torch.save(state, filename) def load_checkpoint(checkpoint, model, optimizer): print("=> Loading checkpoint") model.load_state_dict(checkpoint["state_dict"]) optimizer.load_state_dict(checkpoint["optimizer"])

Step 18 - Model

model = Seq2Seq(net_encoder, net_decoder).to(device) optimizer = optim.Adam(model.parameters(), lr = rate_learning) padding_index = english_sentence.vocab.stoi[''] criterion = nn.CrossEntropyLoss(ignore_index=padding_index) if load_model: load_checkpoint(torch.load('my_checkpoint.pth.ptar'), model, optimizer) sentence = "Hallo, du lernst Pytorch Sequenz zu Sequenz Modlling" for epoch in range(epochs): print(f'Epoch [{epoch} / {epochs}]') checkpoint = {'state_dict':model.state_dict(),'optimizer':optimizer.state_dict()} save_checkpoint = save_checkpoint(checkpoint) model.eval() translate_sentence = translate_sentence(model, sentence, german_sentence, english_sentence, device, max_length = 50) for batch_index, batch in enumerate(iterator_train): input_data = batch.src.to(device) target = batch.trg.to(device) output = model(input_data, target) ## the output shape is going to be the (target_length, batch_size, output_dimension) output = output[1:].reshape(-1, output.shape[2]) target = target[1:].reshape(-1) optimizer.zero_grad() loss = criterion(output, target) loss.backward() torch.nn.utils.clip_grad_norm(model.parameters(), max_norm=1) optimizer.step() summary_writer.add_scalar('training loss', loss, global_step=step) step += 1

What Users are saying..

Ray han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More