What are embeddings and how to use them?

What are embeddings and how to use them?

What are embeddings and how to use them?

This recipe explains what are embeddings and how to use them


Recipe Objective

What are embeddings and how to use them? Embeddings translate large sparse vectors into a lower-dimensional space that preserves the semantic relationships. Word embeddings is a technique where individual words of a language are represented as real-valued vectors in a lower-dimensional space. We can also say these are distributed representations of text in an n-dimensional space. Technically speaking, it is a mapping of words into vectors of real numbers using the neural network, probabilistic model, or dimension reduction on word co-occurrence matrix. It is a language modeling and feature learning technique. Word embedding is a way to perform mapping using a neural network.

Step 1 - Import the necessary libraries

import pandas as pd from gensim.models import word2vec

Step 2 - Take a Sample Text

text1 = ["jack wants to play football","Heena also loves to play football"]

Step 3 - Split the text and create a model for it

tokenized_sentences = [sentence.split() for sentence in text1] model1 = word2vec.Word2Vec(tokenized_sentences, min_count=1)

Step 4 - Summarize vocabulary

words = list(model1.wv.vocab) print(words)
['jack', 'wants', 'to', 'play', 'football', 'Heena', 'also', 'loves']

Here we can see, the words which are repeating are not printed only the unique words are getting printed of the sample text.

Step 5 - Access vector for one word

[-1.40790280e-03  4.58865520e-03 -4.95769829e-03 -1.27252412e-03
  4.81374608e-03  2.77659670e-03 -3.98405176e-03  1.86388765e-03
 -3.97940027e-03  4.20716731e-03  4.15110635e-03 -5.57424966e-04
 -2.3193/h2>317e-03 -2.26494414e-03 -4.22752928e-03  3.89819825e-03
 -5.17438224e-04  2.30374443e-03  4.20636032e-03  4.20677802e-03
 -1.40399823e-03  2.67376262e-03  4.15059133e-03 -8.53536942e-04
  4.09730617e-03 -4.61114757e-03  2.81381537e-03  4.06840025e-03
 -2.21697940e-03  2.47436436e-03 -3.31063266e-03 -2.14591250e-03
 -2.03807699e-03 -4.26412933e-03 -1.11343696e-04  5.39611443e-04
  4.11271071e-03 -3.50002461e-04  4.34909156e-03 -3.14325118e-03
 -2.66004843e-03 -4.72667301e-03 -6.80707395e-04 -6.37957186e-04
  9.92335379e-04  5.06919576e-04 -2.30332976e-03  4.67868708e-03
  2.58262083e-03 -4.42665629e-03 -4.33384068e-03  2.00493122e-03
  3.40585801e-04  4.51424671e-03 -2.24930048e-03 -4.74246824e-03
 -4.26648092e-03 -2.76884600e-03 -3.83922178e-03 -3.57130519e-03
  3.80852376e-04  2.10830034e-03  3.99174780e-04 -2.54857983e-03
 -1.73696945e-03 -2.79853819e-03 -3.59335751e-03  1.93190842e-03
  4.62259306e-03  1.84291916e-03  3.57032637e-03  2.30754865e-03
 -4.00394667e-03  1.34957826e-03 -4.16501053e-03 -4.11755871e-03
 -3.26831010e-03  1.22129067e-03 -6.88223168e-04  2.95645348e-03
 -1.37853972e-03 -2.04168772e-03 -2.96842307e-03  8.23099457e-04
  2.57009082e-03  1.67869462e-03  8.10760757e-05 -4.97947959e-03
  1.55272824e-03 -3.07091884e-03 -2.56623537e-03 -1.66870246e-03
 -1.00509136e-03  5.10989048e-05 -1.95662351e-03  1.54431339e-03
 -1.09352660e-03  7.61516392e-04 -8.73727666e-04  6.75187970e-04]
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:1: DeprecationWarning: Call to deprecated `__getitem__` (Method will be removed in 4.0.0, use self.wv.__getitem__() instead).
  """Entry point for launching an IPython kernel.

Step 6 - Save the model that we have created


Step 7 - load the model

new_model1 = word2vec.Word2Vec.load('model1.bin') print(new_model1)
Word2Vec(vocab=8, size=100, alpha=0.025)

Relevant Projects

Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Build a Collaborative Filtering Recommender System in Python
Use the Amazon Reviews/Ratings dataset of 2 Million records to build a recommender system using memory-based collaborative filtering in Python.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.