How to make a RNN model using chainer

This recipe helps you make a RNN model using chainer

Recipe Objective - How to make a RNN model using chainer?

[Note] Save this file in ".py" format and run it on command line. Because "parse_args" function does not work in jupyter notebook.

Step-by-step Implementation

1. Importing Necessary Packages:

from __future__ import division import argparse
import sys

import numpy as np
import chainer
from chainer import backend
from chainer import backends
from chainer.backends import cuda
from chainer import Function, FunctionNode, gradient_check, report, training, utils, Variable
from chainer import datasets, initializers, iterators, optimizers, serializers
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L
from chainer.training import extensions

2. Defining Training Settings:

parser = argparse.ArgumentParser()

parser.add_argument('arg')
parser.add_argument('--batchsize', '-b', type=int, default=20,help='Number of examples in each mini-batch')
parser.add_argument('--bproplen', '-l', type=int, default=35,help='Number of words in each mini-batch ''(= length of truncated BPTT)')
parser.add_argument('--epoch', '-e', type=int, default=39,help='Number of sweeps over the dataset to train')
parser.add_argument('--device', '-d', type=str, default='-1',help='Device specifier. Either ChainerX device '
'specifier or an integer. If non-negative integer,
''CuPy arrays with specified device id are used. If ''negative integer, NumPy arrays are used')
parser.add_argument('--gradclip', '-c', type=float, default=5,help='Gradient norm threshold to clip')
parser.add_argument('--out', '-o', default='result',help='Directory to output the result')
parser.add_argument('--resume', '-r', type=str,help='Resume the training from snapshot')
parser.add_argument('--test', action='store_true',help='Use tiny datasets for quick tests')
parser.set_defaults(test=False)
parser.add_argument('--unit', '-u', type=int, default=650,help='Number of LSTM units in each layer')
parser.add_argument('--model', '-m', default='model.npz',help='Model file name to serialize')

3. Defining Network Structure:

class RNNForLM(chainer.Chain):

    def __init__(self, n_vocab, n_units):
        super(RNNForLM, self).__init__()
        with self.init_scope():
            self.embed = L.EmbedID(n_vocab, n_units)
            self.l1 = L.LSTM(n_units, n_units)
            self.l2 = L.LSTM(n_units, n_units)
            self.l3 = L.Linear(n_units, n_vocab)

        for param in self.params():
            param.array[...] = np.random.uniform(-0.1, 0.1, param.shape)

    def reset_state(self):
        self.l1.reset_state()
        self.l2.reset_state()

    def forward(self, x):
        h0 = self.embed(x)
        h1 = self.l1(F.dropout(h0))
        h2 = self.l2(F.dropout(h1))
        y = self.l3(F.dropout(h2))
        return y

4. Loading the Penn Tree Bank Long Word Sequence Dataset:

# Load the Penn Tree Bank long word sequence dataset
train, val, test = chainer.datasets.get_ptb_words()

5. Defining Iterator for Making a Mini-batch from the Dataset:

class ParallelSequentialIterator(chainer.dataset.Iterator):

    def __init__(self, dataset, batch_size, repeat=True):
        super(ParallelSequentialIterator, self).__init__()
        self.dataset = dataset
        self.batch_size = batch_size # batch size
        self.repeat = repeat
        length = len(dataset)
# Offsets maintain the position of each sequence in the mini-batch.
        self.offsets = [i * length // batch_size for i in range(batch_size)]
        self.reset()

    def reset(self):
# Number of completed sweeps over the dataset. In this case, it is
# incremented if every word is visited at least once after the last
# increment.
        self.epoch = 0
# True if the epoch is incremented at the last iteration.
        self.is_new_epoch = False
# NOTE: this is not a count of parameter updates. It is just a count of
# calls of ``__next__``.
        self.iteration = 0
# use -1 instead of None internally
        self._previous_epoch_detail = -1.

    def __next__(self):
# This iterator returns a list representing a mini-batch. Each item
# indicates a different position in the original sequence. Each item is
# represented by a pair of two word IDs. The first word is at the
# "current" position, while the second word at the next position.
# At each iteration, the iteration count is incremented, which pushes
# forward the "current" position.
        length = len(self.dataset)
        if not self.repeat and self.iteration * self.batch_size >= length:
# If not self.repeat, this iterator stops at the end of the first
# epoch (i.e., when all words are visited once).
            raise StopIteration
        cur_words = self.get_words()
        self._previous_epoch_detail = self.epoch_detail
        self.iteration += 1
        next_words = self.get_words()

        epoch = self.iteration * self.batch_size // length
        self.is_new_epoch = self.epoch < epoch
        if self.is_new_epoch:
            self.epoch = epoch

        return list(zip(cur_words, next_words))

    @property
    def epoch_detail(self):
# Floating point version of epoch.
        return self.iteration * self.batch_size / len(self.dataset)

    @property
    def previous_epoch_detail(self):
        if self._previous_epoch_detail < 0:
            return None
        return self._previous_epoch_detail

    def get_words(self):
# It returns a list of current words.
        return [self.dataset[(offset + self.iteration) % len(self.dataset)] for offset in self.offsets]

    def serialize(self, serializer):
# It is important to serialize the state to be recovered on resume.
        self.iteration = serializer('iteration', self.iteration)
        self.epoch = serializer('epoch', self.epoch)
        try:
            self._previous_epoch_detail = serializer('previous_epoch_detail', self._previous_epoch_detail)
        except KeyError:
# guess previous_epoch_detail for older version
            self._previous_epoch_detail = self.epoch + \
                (self.current_position - self.batch_size) / len(self.dataset)
            if self.epoch_detail > 0:
                self._previous_epoch_detail = max(self._previous_epoch_detail, 0.)
            else:
                self._previous_epoch_detail = -1.

6. Defining Updater:

class BPTTUpdater(training.updaters.StandardUpdater):

    def __init__(self, train_iter, optimizer, bprop_len, device):
        super(BPTTUpdater, self).__init__(train_iter, optimizer, device=device)
        self.bprop_len = bprop_len

# The core part of the update routine can be customized by overriding.
    def update_core(self):
        loss = 0
# When we pass one iterator and optimizer to StandardUpdater.__init__,
# they are automatically named 'main'.
        train_iter = self.get_iterator('main')
        optimizer = self.get_optimizer('main')

# Progress the dataset iterator for bprop_len words at each iteration.
        for i in range(self.bprop_len):
# Get the next batch (a list of tuples of two word IDs)
            batch = train_iter.__next__()

# Concatenate the word IDs to matrices and send them to the device
# self.converter does this job
# (it is chainer.dataset.concat_examples by default)
            x, t = self.converter(batch, self.device)

# Compute the loss at this time step and accumulate it
            loss += optimizer.target(x, t)

        optimizer.target.cleargrads() # Clear the parameter gradients
        loss.backward() # Backprop
        loss.unchain_backward() # Truncate the graph
        optimizer.update() # Update the parameters

7. Defining Evaluation Function (Perplexity):

def compute_perplexity(result):
    result['perplexity'] = np.exp(result['main/loss'])
    if 'validation/main/loss' in result:
        result['val_perplexity'] = np.exp(result['validation/main/loss'])

8. Iterator:

train_iter = ParallelSequentialIterator(train, args.batchsize)
val_iter = ParallelSequentialIterator(val, 1, repeat=False)
test_iter = ParallelSequentialIterator(test, 1, repeat=False)

9. Create RNN and Classification Model:

args = parser.parse_args()
n_vocab = 1000
rnn = RNNForLM(n_vocab, args.unit)
model = L.Classifier(rnn)
model.compute_accuracy = False # we only want the perplexity

10. Seting up Optimizer:

optimizer = chainer.optimizers.SGD(lr=1.0)
optimizer.setup(model)
optimizer.add_hook(chainer.optimizer_hooks.GradientClipping(args.gradclip))

11. Setup and Run Trainer:

updater = BPTTUpdater(train_iter, optimizer, args.bproplen, device)
trainer = training.Trainer(updater, (args.epoch, 'epoch'), out=args.out)

eval_model = model.copy() # Model with shared params and distinct states
eval_rnn = eval_model.predictor
trainer.extend(extensions.Evaluator(
    val_iter, eval_model, device=device,
    # Reset the RNN state at the beginning of each evaluation
    eval_hook=lambda _: eval_rnn.reset_state()))

interval = 10 if args.test else 500
trainer.extend(extensions.LogReport(postprocess=compute_perplexity,trigger=(interval, 'iteration')))
trainer.extend(extensions.PrintReport(['epoch', 'iteration', 'perplexity', 'val_perplexity']), trigger=(interval, 'iteration'))
trainer.extend(extensions.ProgressBar(update_interval=1 if args.test else 10))
trainer.extend(extensions.snapshot())
trainer.extend(extensions.snapshot_object(model, 'model_iter_{.updater.iteration}'))
if args.resume is not None:
    chainer.serializers.load_npz(args.resume, trainer)
trainer.run()

12. Evaluate the trained model on test dataset:

print('test')
eval_rnn.reset_state()
evaluator = extensions.Evaluator(test_iter, eval_model, device=device)
result = evaluator()
print('test perplexity: {}'.format(np.exp(float(result['main/loss']))))

What Users are saying..

profile image

Ray han

Tech Leader | Stanford / Yale University
linkedin profile url

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Recommender System Machine Learning Project for Beginners-2
Recommender System Machine Learning Project for Beginners Part 2- Learn how to build a recommender system for market basket analysis using association rule mining.

CycleGAN Implementation for Image-To-Image Translation
In this GAN Deep Learning Project, you will learn how to build an image to image translation model in PyTorch with Cycle GAN.

Build Deep Autoencoders Model for Anomaly Detection in Python
In this deep learning project , you will build and deploy a deep autoencoders model using Flask.

Abstractive Text Summarization using Transformers-BART Model
Deep Learning Project to implement an Abstractive Text Summarizer using Google's Transformers-BART Model to generate news article headlines.

MLOps AWS Project on Topic Modeling using Gunicorn Flask
In this project we will see the end-to-end machine learning development process to design, build and manage reproducible, testable, and evolvable machine learning models by using AWS

Multilabel Classification Project for Predicting Shipment Modes
Multilabel Classification Project to build a machine learning model that predicts the appropriate mode of transport for each shipment, using a transport dataset with 2000 unique products. The project explores and compares four different approaches to multilabel classification, including naive independent models, classifier chains, natively multilabel models, and multilabel to multiclass approaches.

Classification Projects on Machine Learning for Beginners - 1
Classification ML Project for Beginners - A Hands-On Approach to Implementing Different Types of Classification Algorithms in Machine Learning for Predictive Modelling

Build Portfolio Optimization Machine Learning Models in R
Machine Learning Project for Financial Risk Modelling and Portfolio Optimization with R- Build a machine learning model in R to develop a strategy for building a portfolio for maximized returns.

Build a Churn Prediction Model using Ensemble Learning
Learn how to build ensemble machine learning models like Random Forest, Adaboost, and Gradient Boosting for Customer Churn Prediction using Python

Build a Graph Based Recommendation System in Python -Part 1
Python Recommender Systems Project - Learn to build a graph based recommendation system in eCommerce to recommend products.