How to use Glove embedings?

How to use Glove embedings?

How to use Glove embedings?

This recipe helps you use Glove embedings

Recipe Objective

How to use Glove embedings? As we have already discussed about Embeddings or Word Embedding and what are they. So Glove Embedding is also another method of creating Word Embeddings. Lets understand more about it.

So Glove Embeddings which is Global vectors for word representation are the method in which we will a take the corpus and will iterate through it and get the co-occurence of each word present in the corpus. We will get a co-occurence matrix through this, the words which occur next to each other will get a value of 1, if they are one word apart then 1/2, if they are two words apart then 1/3 and so on.

Lets get a better clarification by taking an example.

Example :

It is a lovely morning !

Good Morning!!

Is it a lovely morning ?

Step 1 - Import the necessary libraries

import itertools from gensim.models.word2vec import Text8Corpus from glove import Corpus, Glove

Step 2 - Store the sample text file in a variable called sentences

sentences = list(itertools.islice(Text8Corpus('/content/alice_in_wonderland.txt'),None))

Step 3 - Store the Corpus into a variable

corpus = Corpus()

Step 4 - fit the sentences into corpus with a window size of 10, window=10)

Step 5 - Store the Glove in a varibale

glove = Glove(no_components=100, learning_rate=0.05)

Step 6 - Perform the training, epochs=30, no_threads=4, verbose=True)
Epoch 0
Epoch 1
Epoch 2
Epoch 3
Epoch 4
Epoch 5
Epoch 6
Epoch 7
Epoch 8
Epoch 9
Epoch 10
Epoch 11
Epoch 12
Epoch 13
Epoch 14
Epoch 15
Epoch 16
Epoch 17
Epoch 18
Epoch 19
Epoch 20
Epoch 21
Epoch 22
Epoch 23
Epoch 24
Epoch 25
Epoch 26
Epoch 27
Epoch 28
Epoch 29

Here we are going the fit the Glove i.e performing 30 training epochs with 4 threads

Step 7 - Add our corpus dictionary to glove dictionary


Step 8 - Test with some words

[('sobs.', 0.9555372382921672),
 ('reasons.', 0.9555298747918248),
 ('signify:', 0.9551492193112306),
 ('`chop', 0.954856860860499)]
glove.most_similar('this', number=10)
[('time', 0.9964498350533215),
 ('once', 0.9964002559452605),
 ('more', 0.9963721925296446),
 ('any', 0.9955253094062864),
 ('about', 0.9950879007354146),
 ('which', 0.9948399539941413),
 ('turned', 0.9942261952259767),
 ('is', 0.9941542169966086),
 ('them', 0.994141679802586)]
glove.most_similar('Adventures', number=10)
[('hedgehogs,', 0.9398138500036824),
 ("THAT'S", 0.93867888598354),
 ('soup,', 0.9355306192717532),
 ('familiarly', 0.9338930212646674),
 ('showing', 0.9334707250469283),
 ("Turtle's", 0.9328493584474263),
 ('blades', 0.9318802670676556),
 ('heads.', 0.9318625356540701),
 ("refreshments!'", 0.9315206030115342)]
glove.most_similar('girl', number=10)
[('dispute', 0.8922924095994201),
 ('proper', 0.889211005639865),
 ('hurry.', 0.8875119118249284),
 ('remark,', 0.8874202609802221),
 ('bringing', 0.88048150664503),
 ('dog', 0.8769020310475344),
 ('tree.', 0.8758689282289073),
 ('fast', 0.8754031020409732),
 ('rules', 0.8743036670054224)]
glove.most_similar('Alice', number=10)
[('thought', 0.99722981845681),
 ('he', 0.985967266394433),
 ('her,', 0.9848540529024399),
 ('She', 0.984218370767349),
 ('not,', 0.9834714497587523),
 ('much', 0.9827468801839833),
 ("I'm", 0.9826786300945485),
 ('got', 0.9825505635825527),
 ("I've", 0.982494375644852)]

From the above we have seen that how to use glove embeddings for word representation, the above examples specifies us about how it performs.

Relevant Projects

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Ola Bike Rides Request Demand Forecast
Given big data at taxi service (ride-hailing) i.e. OLA, you will learn multi-step time series forecasting and clustering with Mini-Batch K-means Algorithm on geospatial data to predict future ride requests for a particular region at a given time.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

Machine learning for Retail Price Recommendation with Python
Use the Mercari Dataset with dynamic pricing to build a price recommendation algorithm using machine learning in Python to automatically suggest the right product prices.

Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

Image Segmentation using Mask R-CNN with Tensorflow
In this Deep Learning Project on Image Segmentation Python, you will learn how to implement the Mask R-CNN model for early fire detection.