What are embeddings and how to use them?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

What are embeddings and how to use them?

What are embeddings and how to use them?

This recipe explains what are embeddings and how to use them

0

Recipe Objective

What are embeddings and how to use them? Embeddings translate large sparse vectors into a lower-dimensional space that preserves the semantic relationships. Word embeddings is a technique where individual words of a language are represented as real-valued vectors in a lower-dimensional space. We can also say these are distributed representations of text in an n-dimensional space. Technically speaking, it is a mapping of words into vectors of real numbers using the neural network, probabilistic model, or dimension reduction on word co-occurrence matrix. It is a language modeling and feature learning technique. Word embedding is a way to perform mapping using a neural network.

Step 1 - Import the necessary libraries

import pandas as pd from gensim.models import word2vec

Step 2 - Take a Sample Text

text1 = ["jack wants to play football","Heena also loves to play football"]

Step 3 - Split the text and create a model for it

tokenized_sentences = [sentence.split() for sentence in text1] model1 = word2vec.Word2Vec(tokenized_sentences, min_count=1)

Step 4 - Summarize vocabulary

words = list(model1.wv.vocab) print(words)
['jack', 'wants', 'to', 'play', 'football', 'Heena', 'also', 'loves']

Here we can see, the words which are repeating are not printed only the unique words are getting printed of the sample text.

Step 5 - Access vector for one word

print(model1['football'])
[-1.40790280e-03  4.58865520e-03 -4.95769829e-03 -1.27252412e-03
  4.81374608e-03  2.77659670e-03 -3.98405176e-03  1.86388765e-03
 -3.97940027e-03  4.20716731e-03  4.15110635e-03 -5.57424966e-04
 -2.3193/h2>317e-03 -2.26494414e-03 -4.22752928e-03  3.89819825e-03
 -5.17438224e-04  2.30374443e-03  4.20636032e-03  4.20677802e-03
 -1.40399823e-03  2.67376262e-03  4.15059133e-03 -8.53536942e-04
  4.09730617e-03 -4.61114757e-03  2.81381537e-03  4.06840025e-03
 -2.21697940e-03  2.47436436e-03 -3.31063266e-03 -2.14591250e-03
 -2.03807699e-03 -4.26412933e-03 -1.11343696e-04  5.39611443e-04
  4.11271071e-03 -3.50002461e-04  4.34909156e-03 -3.14325118e-03
 -2.66004843e-03 -4.72667301e-03 -6.80707395e-04 -6.37957186e-04
  9.92335379e-04  5.06919576e-04 -2.30332976e-03  4.67868708e-03
  2.58262083e-03 -4.42665629e-03 -4.33384068e-03  2.00493122e-03
  3.40585801e-04  4.51424671e-03 -2.24930048e-03 -4.74246824e-03
 -4.26648092e-03 -2.76884600e-03 -3.83922178e-03 -3.57130519e-03
  3.80852376e-04  2.10830034e-03  3.99174780e-04 -2.54857983e-03
 -1.73696945e-03 -2.79853819e-03 -3.59335751e-03  1.93190842e-03
  4.62259306e-03  1.84291916e-03  3.57032637e-03  2.30754865e-03
 -4.00394667e-03  1.34957826e-03 -4.16501053e-03 -4.11755871e-03
 -3.26831010e-03  1.22129067e-03 -6.88223168e-04  2.95645348e-03
 -1.37853972e-03 -2.04168772e-03 -2.96842307e-03  8.23099457e-04
  2.57009082e-03  1.67869462e-03  8.10760757e-05 -4.97947959e-03
  1.55272824e-03 -3.07091884e-03 -2.56623537e-03 -1.66870246e-03
 -1.00509136e-03  5.10989048e-05 -1.95662351e-03  1.54431339e-03
 -1.09352660e-03  7.61516392e-04 -8.73727666e-04  6.75187970e-04]
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:1: DeprecationWarning: Call to deprecated `__getitem__` (Method will be removed in 4.0.0, use self.wv.__getitem__() instead).
  """Entry point for launching an IPython kernel.

Step 6 - Save the model that we have created

model1.save('model1.bin')

Step 7 - load the model

new_model1 = word2vec.Word2Vec.load('model1.bin') print(new_model1)
Word2Vec(vocab=8, size=100, alpha=0.025)

Relevant Projects

Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Music Recommendation System Project using Python and R
Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.