Explain how LSTMs work and why they are preferred in NLP analysis?

Explain how LSTMs work and why they are preferred in NLP analysis?

Explain how LSTMs work and why they are preferred in NLP analysis?

This recipe explains how LSTMs work and why they are preferred in NLP analysis


Recipe Objective

Explain how LSTM's work and why they are preferred in NLP analysis.

LSTM is nothing but the long short term memory, it is an artificial recurrent neural network used in the field of deep learning, also LSTM's are a special kind of RNN, capable of learning long term dependencies. These networks are based on time series data which are well suited for classifying, processing and making predictions. Also the development of these were done for dealing with the vanishing gradient problem that can be encountered when training traditional RNNs.

LSTM processes the data passing on information as it propagates forward. Within the LSTM's cells the differences are the operations. For LSTN to keep or forget information these operations are used.

Step 1 - Import the necessary libraries

import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import Dropout from keras.layers import LSTM from keras.callbacks import ModelCheckpoint from keras.utils import np_utils

Step 2 - load the sample data

Sample_data = "/content/alice_in_wonderland.txt" wonderland_text = open(Sample_data, 'r', encoding='utf-8').read() wonderland_text = wonderland_text.lower() print(wonderland_text)

Step 3 - Create mapping of unique characters and integers

My_characters = sorted(list(set(wonderland_text))) character_to_integer = dict((c, i) for i, c in enumerate(My_characters)) character_to_integer
{'\n': 0,
 ' ': 1,
 '!': 2,
 '"': 3,
 "'": 4,
 '(': 5,
 ')': 6,
 '*': 7,
 ',': 8,
 '-': 9,
 '.': 10,
 '0': 11,
 '3': 12,
 ':': 13,
 ';': 14,
 '?': 15,
 '[': 16,
 ']': 17,
 '_': 18,
 '`': 19,
 'a': 20,
 'b': 21,
 'c': 22,
 'd': 23,
 'e': 24,
 'f': 25,
 'g': 26,
 'h': 27,
 'i': 28,
 'j': 29,
 'k': 30,
 'l': 31,
 'm': 32,
 'n': 33,
 'o': 34,
 'p': 35,
 'q': 36,
 'r': 37,
 's': 38,
 't': 39,
 'u': 40,
 'v': 41,
 'w': 42,
 'x': 43,
 'y': 44,
 'z': 45}

As we know that we cannot model the characters data directly, so for that we need to convert them into integers the above step is all about that. Firstly we have taken the set of all unique characters present in the data then creating a map of each character to unique integer.

Step 4 - Summarize the data

wonder_chars = len(wonderland_text) wonder_vocab = len(My_characters) print("Total Characters Present in the Sample data: ", wonder_chars) print("Total Vocab in the data: ", wonder_vocab)
Total Characters Present in the Sample data:  148574
Total Vocab in the data:  46

Step 5 - Prepare the dataset

sequence_length = 100 x_data = [] y_data = [] for i in range(0, wonder_chars - sequence_length, 1): sequence_in = wonderland_text[i:i + sequence_length] sequence_out = wonderland_text[i + sequence_length] x_data.append([character_to_integer[char] for char in sequence_in]) y_data.append(character_to_integer[sequence_out]) pattern_nn = len(x_data) print("Result of total patterns:", pattern_nn)
Result of total patterns: 148474

Here we have prepared the data of input and output pairs which are encoded as integers.

Step 6 - Reshaping the data

X = numpy.reshape(x_data, (pattern_nn, sequence_length, 1)) X = X / float(wonder_vocab) y = np_utils.to_categorical(y_data)

Step 7 - Define the LSTM model

model = Sequential() model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]))) model.add(Dropout(0.2)) model.add(Dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam')

Step 8 - Define the checkpoint

filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5" checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min') callbacks_list = [checkpoint]

Step 9 - Fit the model

model.fit(X, y, epochs=1, batch_size=128, callbacks=callbacks_list)
1160/1160 [==============================] - ETA: 0s - loss: 2.7172
Epoch 00001: loss improved from 2.95768 to 2.71722, saving model to weights-improvement-01-2.7172.hdf5
1160/1160 [==============================] - 735s 634ms/step - loss: 2.7172

Relevant Projects

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.