How to use Glove embedings?

How to use Glove embedings?

How to use Glove embedings?

This recipe helps you use Glove embedings


Recipe Objective

How to use Glove embedings? As we have already discussed about Embeddings or Word Embedding and what are they. So Glove Embedding is also another method of creating Word Embeddings. Lets understand more about it.

So Glove Embeddings which is Global vectors for word representation are the method in which we will a take the corpus and will iterate through it and get the co-occurence of each word present in the corpus. We will get a co-occurence matrix through this, the words which occur next to each other will get a value of 1, if they are one word apart then 1/2, if they are two words apart then 1/3 and so on.

Lets get a better clarification by taking an example.

Example :

It is a lovely morning !

Good Morning!!

Is it a lovely morning ?

Step 1 - Import the necessary libraries

import itertools from gensim.models.word2vec import Text8Corpus from glove import Corpus, Glove

Step 2 - Store the sample text file in a variable called sentences

sentences = list(itertools.islice(Text8Corpus('/content/alice_in_wonderland.txt'),None))

Step 3 - Store the Corpus into a variable

corpus = Corpus()

Step 4 - fit the sentences into corpus with a window size of 10, window=10)

Step 5 - Store the Glove in a varibale

glove = Glove(no_components=100, learning_rate=0.05)

Step 6 - Perform the training, epochs=30, no_threads=4, verbose=True)
Epoch 0
Epoch 1
Epoch 2
Epoch 3
Epoch 4
Epoch 5
Epoch 6
Epoch 7
Epoch 8
Epoch 9
Epoch 10
Epoch 11
Epoch 12
Epoch 13
Epoch 14
Epoch 15
Epoch 16
Epoch 17
Epoch 18
Epoch 19
Epoch 20
Epoch 21
Epoch 22
Epoch 23
Epoch 24
Epoch 25
Epoch 26
Epoch 27
Epoch 28
Epoch 29

Here we are going the fit the Glove i.e performing 30 training epochs with 4 threads

Step 7 - Add our corpus dictionary to glove dictionary


Step 8 - Test with some words

[('sobs.', 0.9555372382921672),
 ('reasons.', 0.9555298747918248),
 ('signify:', 0.9551492193112306),
 ('`chop', 0.954856860860499)]
glove.most_similar('this', number=10)
[('time', 0.9964498350533215),
 ('once', 0.9964002559452605),
 ('more', 0.9963721925296446),
 ('any', 0.9955253094062864),
 ('about', 0.9950879007354146),
 ('which', 0.9948399539941413),
 ('turned', 0.9942261952259767),
 ('is', 0.9941542169966086),
 ('them', 0.994141679802586)]
glove.most_similar('Adventures', number=10)
[('hedgehogs,', 0.9398138500036824),
 ("THAT'S", 0.93867888598354),
 ('soup,', 0.9355306192717532),
 ('familiarly', 0.9338930212646674),
 ('showing', 0.9334707250469283),
 ("Turtle's", 0.9328493584474263),
 ('blades', 0.9318802670676556),
 ('heads.', 0.9318625356540701),
 ("refreshments!'", 0.9315206030115342)]
glove.most_similar('girl', number=10)
[('dispute', 0.8922924095994201),
 ('proper', 0.889211005639865),
 ('hurry.', 0.8875119118249284),
 ('remark,', 0.8874202609802221),
 ('bringing', 0.88048150664503),
 ('dog', 0.8769020310475344),
 ('tree.', 0.8758689282289073),
 ('fast', 0.8754031020409732),
 ('rules', 0.8743036670054224)]
glove.most_similar('Alice', number=10)
[('thought', 0.99722981845681),
 ('he', 0.985967266394433),
 ('her,', 0.9848540529024399),
 ('She', 0.984218370767349),
 ('not,', 0.9834714497587523),
 ('much', 0.9827468801839833),
 ("I'm", 0.9826786300945485),
 ('got', 0.9825505635825527),
 ("I've", 0.982494375644852)]

From the above we have seen that how to use glove embeddings for word representation, the above examples specifies us about how it performs.

Relevant Projects

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Topic modelling using Kmeans clustering to group customer reviews
In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.

Demand prediction of driver availability using multistep time series analysis
In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Build a Collaborative Filtering Recommender System in Python
Use the Amazon Reviews/Ratings dataset of 2 Million records to build a recommender system using memory-based collaborative filtering in Python.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.