Explain how LSTMs work and why they are preferred in NLP analysis?

This recipe explains how LSTMs work and why they are preferred in NLP analysis
Last Updated: 18 Aug 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Recipe Objective

Explain how LSTM's work and why they are preferred in NLP analysis.

LSTM is nothing but the long short term memory, it is an artificial recurrent neural network used in the field of deep learning, also LSTM's are a special kind of RNN, capable of learning long term dependencies. These networks are based on time series data which are well suited for classifying, processing and making predictions. Also the development of these were done for dealing with the vanishing gradient problem that can be encountered when training traditional RNNs.

Build Expedia Hotel Recommendation System using Machine Learning

LSTM processes the data passing on information as it propagates forward. Within the LSTM's cells the differences are the operations. For LSTN to keep or forget information these operations are used.

Step 1 - Import the necessary libraries

import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import Dropout from keras.layers import LSTM from keras.callbacks import ModelCheckpoint from keras.utils import np_utils

Step 2 - load the sample data

Sample_data = "/content/alice_in_wonderland.txt" wonderland_text = open(Sample_data, 'r', encoding='utf-8').read() wonderland_text = wonderland_text.lower() print(wonderland_text)

Step 3 - Create mapping of unique characters and integers

My_characters = sorted(list(set(wonderland_text))) character_to_integer = dict((c, i) for i, c in enumerate(My_characters)) character_to_integer

{'\n': 0,
 ' ': 1,
 '!': 2,
 '"': 3,
 "'": 4,
 '(': 5,
 ')': 6,
 '*': 7,
 ',': 8,
 '-': 9,
 '.': 10,
 '0': 11,
 '3': 12,
 ':': 13,
 ';': 14,
 '?': 15,
 '[': 16,
 ']': 17,
 '_': 18,
 '`': 19,
 'a': 20,
 'b': 21,
 'c': 22,
 'd': 23,
 'e': 24,
 'f': 25,
 'g': 26,
 'h': 27,
 'i': 28,
 'j': 29,
 'k': 30,
 'l': 31,
 'm': 32,
 'n': 33,
 'o': 34,
 'p': 35,
 'q': 36,
 'r': 37,
 's': 38,
 't': 39,
 'u': 40,
 'v': 41,
 'w': 42,
 'x': 43,
 'y': 44,
 'z': 45}

As we know that we cannot model the characters data directly, so for that we need to convert them into integers the above step is all about that. Firstly we have taken the set of all unique characters present in the data then creating a map of each character to unique integer.

Step 4 - Summarize the data

wonder_chars = len(wonderland_text) wonder_vocab = len(My_characters) print("Total Characters Present in the Sample data: ", wonder_chars) print("Total Vocab in the data: ", wonder_vocab)

Total Characters Present in the Sample data:  148574
Total Vocab in the data:  46

Step 5 - Prepare the dataset

sequence_length = 100 x_data = [] y_data = [] for i in range(0, wonder_chars - sequence_length, 1): sequence_in = wonderland_text[i:i + sequence_length] sequence_out = wonderland_text[i + sequence_length] x_data.append([character_to_integer[char] for char in sequence_in]) y_data.append(character_to_integer[sequence_out]) pattern_nn = len(x_data) print("Result of total patterns:", pattern_nn)

Result of total patterns: 148474

Here we have prepared the data of input and output pairs which are encoded as integers.

Step 6 - Reshaping the data

X = numpy.reshape(x_data, (pattern_nn, sequence_length, 1)) X = X / float(wonder_vocab) y = np_utils.to_categorical(y_data)

Step 7 - Define the LSTM model

model = Sequential() model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]))) model.add(Dropout(0.2)) model.add(Dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam')

Step 8 - Define the checkpoint

filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5" checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min') callbacks_list = [checkpoint]

Step 9 - Fit the model

model.fit(X, y, epochs=1, batch_size=128, callbacks=callbacks_list)

1160/1160 [==============================] - ETA: 0s - loss: 2.7172
Epoch 00001: loss improved from 2.95768 to 2.71722, saving model to weights-improvement-01-2.7172.hdf5
1160/1160 [==============================] - 735s 634ms/step - loss: 2.7172

What Users are saying..

Ameeruddin Mohammed

ETL (Abintio) developer at IBM

I come from a background in Marketing and Analytics and when I developed an interest in Machine Learning algorithms, I did multiple in-class courses from reputed institutions though I got good... Read More