What are embeddings and how to use them?

What are embeddings and how to use them?

What are embeddings and how to use them?

This recipe explains what are embeddings and how to use them

Recipe Objective

What are embeddings and how to use them? Embeddings translate large sparse vectors into a lower-dimensional space that preserves the semantic relationships. Word embeddings is a technique where individual words of a language are represented as real-valued vectors in a lower-dimensional space. We can also say these are distributed representations of text in an n-dimensional space. Technically speaking, it is a mapping of words into vectors of real numbers using the neural network, probabilistic model, or dimension reduction on word co-occurrence matrix. It is a language modeling and feature learning technique. Word embedding is a way to perform mapping using a neural network.

Step 1 - Import the necessary libraries

import pandas as pd from gensim.models import word2vec

Step 2 - Take a Sample Text

text1 = ["jack wants to play football","Heena also loves to play football"]

Step 3 - Split the text and create a model for it

tokenized_sentences = [sentence.split() for sentence in text1] model1 = word2vec.Word2Vec(tokenized_sentences, min_count=1)

Step 4 - Summarize vocabulary

words = list(model1.wv.vocab) print(words)
['jack', 'wants', 'to', 'play', 'football', 'Heena', 'also', 'loves']

Here we can see, the words which are repeating are not printed only the unique words are getting printed of the sample text.

Step 5 - Access vector for one word

[-1.40790280e-03  4.58865520e-03 -4.95769829e-03 -1.27252412e-03
  4.81374608e-03  2.77659670e-03 -3.98405176e-03  1.86388765e-03
 -3.97940027e-03  4.20716731e-03  4.15110635e-03 -5.57424966e-04
 -2.3193/h2>317e-03 -2.26494414e-03 -4.22752928e-03  3.89819825e-03
 -5.17438224e-04  2.30374443e-03  4.20636032e-03  4.20677802e-03
 -1.40399823e-03  2.67376262e-03  4.15059133e-03 -8.53536942e-04
  4.09730617e-03 -4.61114757e-03  2.81381537e-03  4.06840025e-03
 -2.21697940e-03  2.47436436e-03 -3.31063266e-03 -2.14591250e-03
 -2.03807699e-03 -4.26412933e-03 -1.11343696e-04  5.39611443e-04
  4.11271071e-03 -3.50002461e-04  4.34909156e-03 -3.14325118e-03
 -2.66004843e-03 -4.72667301e-03 -6.80707395e-04 -6.37957186e-04
  9.92335379e-04  5.06919576e-04 -2.30332976e-03  4.67868708e-03
  2.58262083e-03 -4.42665629e-03 -4.33384068e-03  2.00493122e-03
  3.40585801e-04  4.51424671e-03 -2.24930048e-03 -4.74246824e-03
 -4.26648092e-03 -2.76884600e-03 -3.83922178e-03 -3.57130519e-03
  3.80852376e-04  2.10830034e-03  3.99174780e-04 -2.54857983e-03
 -1.73696945e-03 -2.79853819e-03 -3.59335751e-03  1.93190842e-03
  4.62259306e-03  1.84291916e-03  3.57032637e-03  2.30754865e-03
 -4.00394667e-03  1.34957826e-03 -4.16501053e-03 -4.11755871e-03
 -3.26831010e-03  1.22129067e-03 -6.88223168e-04  2.95645348e-03
 -1.37853972e-03 -2.04168772e-03 -2.96842307e-03  8.23099457e-04
  2.57009082e-03  1.67869462e-03  8.10760757e-05 -4.97947959e-03
  1.55272824e-03 -3.07091884e-03 -2.56623537e-03 -1.66870246e-03
 -1.00509136e-03  5.10989048e-05 -1.95662351e-03  1.54431339e-03
 -1.09352660e-03  7.61516392e-04 -8.73727666e-04  6.75187970e-04]
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:1: DeprecationWarning: Call to deprecated `__getitem__` (Method will be removed in 4.0.0, use self.wv.__getitem__() instead).
  """Entry point for launching an IPython kernel.

Step 6 - Save the model that we have created


Step 7 - load the model

new_model1 = word2vec.Word2Vec.load('model1.bin') print(new_model1)
Word2Vec(vocab=8, size=100, alpha=0.025)

Relevant Projects

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

Build a Face Recognition System in Python using FaceNet
In this deep learning project, you will build your own face recognition system in Python using OpenCV and FaceNet by extracting features from an image of a person's face.

Locality Sensitive Hashing Python Code for Look-Alike Modelling
In this deep learning project, you will find similar images (lookalikes) using deep learning and locality sensitive hashing to find customers who are most likely to click on an ad.

Convolutional RCCn's for extracting the text out of images
CRNNs combine both convolutional and recurrent architectures and is widely used in text detection and optical character recognition (OCR). In this project, we are going to use a CRNN architecture to detect text in sample images. The data we are going to use is TRSynth100k from Kaggle. Given an image containing some text, the goal here is to correctly identify the text using the CRNN architecture. We are going to train the model end-to-end from scratch.

Build a Music Recommendation Algorithm using KKBox's Dataset
Music Recommendation Project using Machine Learning - Use the KKBox dataset to predict the chances of a user listening to a song again after their very first noticeable listening event.

Image Segmentation using Mask R-CNN with Tensorflow
In this Deep Learning Project on Image Segmentation Python, you will learn how to implement the Mask R-CNN model for early fire detection.

Machine Learning Project to Forecast Rossmann Store Sales
In this machine learning project you will work on creating a robust prediction model of Rossmann's daily sales using store, promotion, and competitor data.

Time Series LSTM forecasting
In this project, we will use time-series forecasting to predict the values of a sensor using multiple dependent variables. A variety of machine learning models are applied in this task of time series forecasting. We will see a comparison between the LSTM, ARIMA and Regression models. Classical forecasting methods like ARIMA are still popular and powerful but they lack the overall generalizability that memory-based models like LSTM offer. Every model has its own advantages and disadvantages and that will be discussed. The main objective of this article is to lead you through building a working LSTM model and it's different variants such as Vanilla, Stacked, Bidirectional, etc. There will be special focus on customized data preparation for LSTM.

NLP and Deep Learning For Fake News Classification in Python
In this project you will use Python to implement various machine learning methods( RNN, LSTM, GRU) for fake news classification.

Forecasting Business KPI's with Tensorflow and Python
In this machine learning project, you will use the video clip of an IPL match played between CSK and RCB to forecast key performance indicators like the number of appearances of a brand logo, the frames, and the shortest and longest area percentage in the video.