How to use Glove embedings in nlp

This recipe helps you use Glove embedings in nlp

Recipe Objective

How to use Glove embedings? As we have already discussed about Embeddings or Word Embedding and what are they. So Glove Embedding is also another method of creating Word Embeddings. Lets understand more about it.

So Glove Embeddings which is Global vectors for word representation are the method in which we will a take the corpus and will iterate through it and get the co-occurence of each word present in the corpus. We will get a co-occurence matrix through this, the words which occur next to each other will get a value of 1, if they are one word apart then 1/2, if they are two words apart then 1/3 and so on.

Lets get a better clarification by taking an example.

Example :

It is a lovely morning !

Good Morning!!

Is it a lovely morning ?

Step 1 - Import the necessary libraries

import itertools from gensim.models.word2vec import Text8Corpus from glove import Corpus, Glove

Step 2 - Store the sample text file in a variable called sentences

sentences = list(itertools.islice(Text8Corpus('/content/alice_in_wonderland.txt'),None))

Step 3 - Store the Corpus into a variable

corpus = Corpus()

Step 4 - fit the sentences into corpus with a window size of 10

corpus.fit(sentences, window=10)

Step 5 - Store the Glove in a varibale

glove = Glove(no_components=100, learning_rate=0.05)

Step 6 - Perform the training

glove.fit(corpus.matrix, epochs=30, no_threads=4, verbose=True)

Epoch 0
Epoch 1
Epoch 2
Epoch 3
Epoch 4
Epoch 5
Epoch 6
Epoch 7
Epoch 8
Epoch 9
Epoch 10
Epoch 11
Epoch 12
Epoch 13
Epoch 14
Epoch 15
Epoch 16
Epoch 17
Epoch 18
Epoch 19
Epoch 20
Epoch 21
Epoch 22
Epoch 23
Epoch 24
Epoch 25
Epoch 26
Epoch 27
Epoch 28
Epoch 29

Here we are going the fit the Glove i.e performing 30 training epochs with 4 threads

Step 7 - Add our corpus dictionary to glove dictionary

glove.add_dictionary(corpus.dictionary)

Step 8 - Test with some words

glove.most_similar('man')

[('sobs.', 0.9555372382921672),
 ('reasons.', 0.9555298747918248),
 ('signify:', 0.9551492193112306),
 ('`chop', 0.954856860860499)]

glove.most_similar('this', number=10)

[('time', 0.9964498350533215),
 ('once', 0.9964002559452605),
 ('more', 0.9963721925296446),
 ('any', 0.9955253094062864),
 ('about', 0.9950879007354146),
 ('which', 0.9948399539941413),
 ('turned', 0.9942261952259767),
 ('is', 0.9941542169966086),
 ('them', 0.994141679802586)]

glove.most_similar('Adventures', number=10)

[('hedgehogs,', 0.9398138500036824),
 ("THAT'S", 0.93867888598354),
 ('soup,', 0.9355306192717532),
 ('familiarly', 0.9338930212646674),
 ('showing', 0.9334707250469283),
 ("Turtle's", 0.9328493584474263),
 ('blades', 0.9318802670676556),
 ('heads.', 0.9318625356540701),
 ("refreshments!'", 0.9315206030115342)]

glove.most_similar('girl', number=10)

[('dispute', 0.8922924095994201),
 ('proper', 0.889211005639865),
 ('hurry.', 0.8875119118249284),
 ('remark,', 0.8874202609802221),
 ('bringing', 0.88048150664503),
 ('dog', 0.8769020310475344),
 ('tree.', 0.8758689282289073),
 ('fast', 0.8754031020409732),
 ('rules', 0.8743036670054224)]

glove.most_similar('Alice', number=10)

[('thought', 0.99722981845681),
 ('he', 0.985967266394433),
 ('her,', 0.9848540529024399),
 ('She', 0.984218370767349),
 ('not,', 0.9834714497587523),
 ('much', 0.9827468801839833),
 ("I'm", 0.9826786300945485),
 ('got', 0.9825505635825527),
 ("I've", 0.982494375644852)]

From the above we have seen that how to use glove embeddings for word representation, the above examples specifies us about how it performs.

What Users are saying..

profile image

Jingwei Li

Graduate Research assistance at Stony Brook University
linkedin profile url

ProjectPro is an awesome platform that helps me learn much hands-on industrial experience with a step-by-step walkthrough of projects. There are two primary paths to learn: Data Science and Big Data.... Read More

Relevant Projects

Linear Regression Model Project in Python for Beginners Part 1
Machine Learning Linear Regression Project in Python to build a simple linear regression model and master the fundamentals of regression for beginners.

Create Your First Chatbot with RASA NLU Model and Python
Learn the basic aspects of chatbot development and open source conversational AI RASA to create a simple AI powered chatbot on your own.

NLP Project for Beginners on Text Processing and Classification
This Project Explains the Basic Text Preprocessing and How to Build a Classification Model in Python

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Build Time Series Models for Gaussian Processes in Python
Time Series Project - A hands-on approach to Gaussian Processes for Time Series Modelling in Python

Hands-On Approach to Causal Inference in Machine Learning
In this Machine Learning Project, you will learn to implement various causal inference techniques in Python to determine, how effective the sprinkler is in making the grass wet.

Word2Vec and FastText Word Embedding with Gensim in Python
In this NLP Project, you will learn how to use the popular topic modelling library Gensim for implementing two state-of-the-art word embedding methods Word2Vec and FastText models.

Time Series Analysis with Facebook Prophet Python and Cesium
Time Series Analysis Project - Use the Facebook Prophet and Cesium Open Source Library for Time Series Forecasting in Python

NLP Project on LDA Topic Modelling Python using RACE Dataset
Use the RACE dataset to extract a dominant topic from each document and perform LDA topic modeling in python.

Abstractive Text Summarization using Transformers-BART Model
Deep Learning Project to implement an Abstractive Text Summarizer using Google's Transformers-BART Model to generate news article headlines.