Explain how LSTMs work and why they are preferred in NLP analysis?

This recipe explains how LSTMs work and why they are preferred in NLP analysis

Recipe Objective

Explain how LSTM's work and why they are preferred in NLP analysis.

LSTM is nothing but the long short term memory, it is an artificial recurrent neural network used in the field of deep learning, also LSTM's are a special kind of RNN, capable of learning long term dependencies. These networks are based on time series data which are well suited for classifying, processing and making predictions. Also the development of these were done for dealing with the vanishing gradient problem that can be encountered when training traditional RNNs.

Build Expedia Hotel Recommendation System using Machine Learning

LSTM processes the data passing on information as it propagates forward. Within the LSTM's cells the differences are the operations. For LSTN to keep or forget information these operations are used.

Step 1 - Import the necessary libraries

import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import Dropout from keras.layers import LSTM from keras.callbacks import ModelCheckpoint from keras.utils import np_utils

Step 2 - load the sample data

Sample_data = "/content/alice_in_wonderland.txt" wonderland_text = open(Sample_data, 'r', encoding='utf-8').read() wonderland_text = wonderland_text.lower() print(wonderland_text)

Step 3 - Create mapping of unique characters and integers

My_characters = sorted(list(set(wonderland_text))) character_to_integer = dict((c, i) for i, c in enumerate(My_characters)) character_to_integer

{'\n': 0,
 ' ': 1,
 '!': 2,
 '"': 3,
 "'": 4,
 '(': 5,
 ')': 6,
 '*': 7,
 ',': 8,
 '-': 9,
 '.': 10,
 '0': 11,
 '3': 12,
 ':': 13,
 ';': 14,
 '?': 15,
 '[': 16,
 ']': 17,
 '_': 18,
 '`': 19,
 'a': 20,
 'b': 21,
 'c': 22,
 'd': 23,
 'e': 24,
 'f': 25,
 'g': 26,
 'h': 27,
 'i': 28,
 'j': 29,
 'k': 30,
 'l': 31,
 'm': 32,
 'n': 33,
 'o': 34,
 'p': 35,
 'q': 36,
 'r': 37,
 's': 38,
 't': 39,
 'u': 40,
 'v': 41,
 'w': 42,
 'x': 43,
 'y': 44,
 'z': 45}

As we know that we cannot model the characters data directly, so for that we need to convert them into integers the above step is all about that. Firstly we have taken the set of all unique characters present in the data then creating a map of each character to unique integer.

Step 4 - Summarize the data

wonder_chars = len(wonderland_text) wonder_vocab = len(My_characters) print("Total Characters Present in the Sample data: ", wonder_chars) print("Total Vocab in the data: ", wonder_vocab)

Total Characters Present in the Sample data:  148574
Total Vocab in the data:  46

Step 5 - Prepare the dataset

sequence_length = 100 x_data = [] y_data = [] for i in range(0, wonder_chars - sequence_length, 1): sequence_in = wonderland_text[i:i + sequence_length] sequence_out = wonderland_text[i + sequence_length] x_data.append([character_to_integer[char] for char in sequence_in]) y_data.append(character_to_integer[sequence_out]) pattern_nn = len(x_data) print("Result of total patterns:", pattern_nn)

Result of total patterns: 148474

Here we have prepared the data of input and output pairs which are encoded as integers.

Step 6 - Reshaping the data

X = numpy.reshape(x_data, (pattern_nn, sequence_length, 1)) X = X / float(wonder_vocab) y = np_utils.to_categorical(y_data)

Step 7 - Define the LSTM model

model = Sequential() model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]))) model.add(Dropout(0.2)) model.add(Dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam')

Step 8 - Define the checkpoint

filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5" checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min') callbacks_list = [checkpoint]

Step 9 - Fit the model

model.fit(X, y, epochs=1, batch_size=128, callbacks=callbacks_list)

1160/1160 [==============================] - ETA: 0s - loss: 2.7172
Epoch 00001: loss improved from 2.95768 to 2.71722, saving model to weights-improvement-01-2.7172.hdf5
1160/1160 [==============================] - 735s 634ms/step - loss: 2.7172

What Users are saying..

profile image

Savvy Sahai

Data Science Intern, Capgemini
linkedin profile url

As a student looking to break into the field of data engineering and data science, one can get really confused as to which path to take. Very few ways to do it are Google, YouTube, etc. I was one of... Read More

Relevant Projects

Insurance Pricing Forecast Using XGBoost Regressor
In this project, we are going to talk about insurance forecast by using linear and xgboost regression techniques.

ML Model Deployment on AWS for Customer Churn Prediction
MLOps Project-Deploy Machine Learning Model to Production Python on AWS for Customer Churn Prediction

NLP and Deep Learning For Fake News Classification in Python
In this project you will use Python to implement various machine learning methods( RNN, LSTM, GRU) for fake news classification.

MLOps Project for a Mask R-CNN on GCP using uWSGI Flask
MLOps on GCP - Solved end-to-end MLOps Project to deploy a Mask RCNN Model for Image Segmentation as a Web Application using uWSGI Flask, Docker, and TensorFlow.

Build Customer Propensity to Purchase Model in Python
In this machine learning project, you will learn to build a machine learning model to estimate customer propensity to purchase.

Image Segmentation using Mask R-CNN with Tensorflow
In this Deep Learning Project on Image Segmentation Python, you will learn how to implement the Mask R-CNN model for early fire detection.

Build CNN for Image Colorization using Deep Transfer Learning
Image Processing Project -Train a model for colorization to make grayscale images colorful using convolutional autoencoders.

Forecasting Business KPI's with Tensorflow and Python
In this machine learning project, you will use the video clip of an IPL match played between CSK and RCB to forecast key performance indicators like the number of appearances of a brand logo, the frames, and the shortest and longest area percentage in the video.

Build a Multi ClassText Classification Model using Naive Bayes
Implement the Naive Bayes Algorithm to build a multi class text classification model in Python.

AWS MLOps Project for Gaussian Process Time Series Modeling
MLOps Project to Build and Deploy a Gaussian Process Time Series Model in Python on AWS