How to use Glove embedings in nlp

This recipe helps you use Glove embedings in nlp
Last Updated: 23 Jun 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Recipe Objective

How to use Glove embedings? As we have already discussed about Embeddings or Word Embedding and what are they. So Glove Embedding is also another method of creating Word Embeddings. Lets understand more about it.

So Glove Embeddings which is Global vectors for word representation are the method in which we will a take the corpus and will iterate through it and get the co-occurence of each word present in the corpus. We will get a co-occurence matrix through this, the words which occur next to each other will get a value of 1, if they are one word apart then 1/2, if they are two words apart then 1/3 and so on.

Lets get a better clarification by taking an example.

Example :

It is a lovely morning !

Good Morning!!

Is it a lovely morning ?

Recipe Objective

Step 1 - Import the necessary libraries

import itertools from gensim.models.word2vec import Text8Corpus from glove import Corpus, Glove

Step 2 - Store the sample text file in a variable called sentences

sentences = list(itertools.islice(Text8Corpus('/content/alice_in_wonderland.txt'),None))

Step 3 - Store the Corpus into a variable

corpus = Corpus()

Step 4 - fit the sentences into corpus with a window size of 10

corpus.fit(sentences, window=10)

Step 5 - Store the Glove in a varibale

glove = Glove(no_components=100, learning_rate=0.05)

Step 6 - Perform the training

glove.fit(corpus.matrix, epochs=30, no_threads=4, verbose=True)

Epoch 0
Epoch 1
Epoch 2
Epoch 3
Epoch 4
Epoch 5
Epoch 6
Epoch 7
Epoch 8
Epoch 9
Epoch 10
Epoch 11
Epoch 12
Epoch 13
Epoch 14
Epoch 15
Epoch 16
Epoch 17
Epoch 18
Epoch 19
Epoch 20
Epoch 21
Epoch 22
Epoch 23
Epoch 24
Epoch 25
Epoch 26
Epoch 27
Epoch 28
Epoch 29

Here we are going the fit the Glove i.e performing 30 training epochs with 4 threads

Step 7 - Add our corpus dictionary to glove dictionary

glove.add_dictionary(corpus.dictionary)

Step 8 - Test with some words

glove.most_similar('man')

[('sobs.', 0.9555372382921672),
 ('reasons.', 0.9555298747918248),
 ('signify:', 0.9551492193112306),
 ('`chop', 0.954856860860499)]

glove.most_similar('this', number=10)

[('time', 0.9964498350533215),
 ('once', 0.9964002559452605),
 ('more', 0.9963721925296446),
 ('any', 0.9955253094062864),
 ('about', 0.9950879007354146),
 ('which', 0.9948399539941413),
 ('turned', 0.9942261952259767),
 ('is', 0.9941542169966086),
 ('them', 0.994141679802586)]

glove.most_similar('Adventures', number=10)

[('hedgehogs,', 0.9398138500036824),
 ("THAT'S", 0.93867888598354),
 ('soup,', 0.9355306192717532),
 ('familiarly', 0.9338930212646674),
 ('showing', 0.9334707250469283),
 ("Turtle's", 0.9328493584474263),
 ('blades', 0.9318802670676556),
 ('heads.', 0.9318625356540701),
 ("refreshments!'", 0.9315206030115342)]

glove.most_similar('girl', number=10)

[('dispute', 0.8922924095994201),
 ('proper', 0.889211005639865),
 ('hurry.', 0.8875119118249284),
 ('remark,', 0.8874202609802221),
 ('bringing', 0.88048150664503),
 ('dog', 0.8769020310475344),
 ('tree.', 0.8758689282289073),
 ('fast', 0.8754031020409732),
 ('rules', 0.8743036670054224)]

glove.most_similar('Alice', number=10)

[('thought', 0.99722981845681),
 ('he', 0.985967266394433),
 ('her,', 0.9848540529024399),
 ('She', 0.984218370767349),
 ('not,', 0.9834714497587523),
 ('much', 0.9827468801839833),
 ("I'm", 0.9826786300945485),
 ('got', 0.9825505635825527),
 ("I've", 0.982494375644852)]

From the above we have seen that how to use glove embeddings for word representation, the above examples specifies us about how it performs.

What Users are saying..

Jingwei Li

Graduate Research assistance at Stony Brook University

ProjectPro is an awesome platform that helps me learn much hands-on industrial experience with a step-by-step walkthrough of projects. There are two primary paths to learn: Data Science and Big Data.... Read More