What are embeddings in nlp and how to use them

This recipe explains what are embeddings in nlp and how to use them

Recipe Objective

What are embeddings and how to use them? Embeddings translate large sparse vectors into a lower-dimensional space that preserves the semantic relationships. Word embeddings is a technique where individual words of a language are represented as real-valued vectors in a lower-dimensional space. We can also say these are distributed representations of text in an n-dimensional space. Technically speaking, it is a mapping of words into vectors of real numbers using the neural network, probabilistic model, or dimension reduction on word co-occurrence matrix. It is a language modeling and feature learning technique. Word embedding is a way to perform mapping using a neural network.

NLP Techniques to Learn for your Next NLP Project

Step 1 - Import the necessary libraries

import pandas as pd from gensim.models import word2vec

Step 2 - Take a Sample Text

text1 = ["jack wants to play football","Heena also loves to play football"]

Step 3 - Split the text and create a model for it

tokenized_sentences = [sentence.split() for sentence in text1] model1 = word2vec.Word2Vec(tokenized_sentences, min_count=1)

Step 4 - Summarize vocabulary

words = list(model1.wv.vocab) print(words)

['jack', 'wants', 'to', 'play', 'football', 'Heena', 'also', 'loves']

Here we can see, the words which are repeating are not printed only the unique words are getting printed of the sample text.

Step 5 - Access vector for one word

print(model1['football'])

[-1.40790280e-03  4.58865520e-03 -4.95769829e-03 -1.27252412e-03
  4.81374608e-03  2.77659670e-03 -3.98405176e-03  1.86388765e-03
 -3.97940027e-03  4.20716731e-03  4.15110635e-03 -5.57424966e-04
 -2.3193/h2>317e-03 -2.26494414e-03 -4.22752928e-03  3.89819825e-03
 -5.17438224e-04  2.30374443e-03  4.20636032e-03  4.20677802e-03
 -1.40399823e-03  2.67376262e-03  4.15059133e-03 -8.53536942e-04
  4.09730617e-03 -4.61114757e-03  2.81381537e-03  4.06840025e-03
 -2.21697940e-03  2.47436436e-03 -3.31063266e-03 -2.14591250e-03
 -2.03807699e-03 -4.26412933e-03 -1.11343696e-04  5.39611443e-04
  4.11271071e-03 -3.50002461e-04  4.34909156e-03 -3.14325118e-03
 -2.66004843e-03 -4.72667301e-03 -6.80707395e-04 -6.37957186e-04
  9.92335379e-04  5.06919576e-04 -2.30332976e-03  4.67868708e-03
  2.58262083e-03 -4.42665629e-03 -4.33384068e-03  2.00493122e-03
  3.40585801e-04  4.51424671e-03 -2.24930048e-03 -4.74246824e-03
 -4.26648092e-03 -2.76884600e-03 -3.83922178e-03 -3.57130519e-03
  3.80852376e-04  2.10830034e-03  3.99174780e-04 -2.54857983e-03
 -1.73696945e-03 -2.79853819e-03 -3.59335751e-03  1.93190842e-03
  4.62259306e-03  1.84291916e-03  3.57032637e-03  2.30754865e-03
 -4.00394667e-03  1.34957826e-03 -4.16501053e-03 -4.11755871e-03
 -3.26831010e-03  1.22129067e-03 -6.88223168e-04  2.95645348e-03
 -1.37853972e-03 -2.04168772e-03 -2.96842307e-03  8.23099457e-04
  2.57009082e-03  1.67869462e-03  8.10760757e-05 -4.97947959e-03
  1.55272824e-03 -3.07091884e-03 -2.56623537e-03 -1.66870246e-03
 -1.00509136e-03  5.10989048e-05 -1.95662351e-03  1.54431339e-03
 -1.09352660e-03  7.61516392e-04 -8.73727666e-04  6.75187970e-04]
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:1: DeprecationWarning: Call to deprecated `__getitem__` (Method will be removed in 4.0.0, use self.wv.__getitem__() instead).
  """Entry point for launching an IPython kernel.

Step 6 - Save the model that we have created

model1.save('model1.bin')

Step 7 - load the model

new_model1 = word2vec.Word2Vec.load('model1.bin') print(new_model1)

Word2Vec(vocab=8, size=100, alpha=0.025)

What Users are saying..

profile image

Ray han

Tech Leader | Stanford / Yale University
linkedin profile url

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

BigMart Sales Prediction ML Project in Python
The goal of the BigMart Sales Prediction ML project is to build and evaluate different predictive models and determine the sales of each product at a store.

Create Your First Chatbot with RASA NLU Model and Python
Learn the basic aspects of chatbot development and open source conversational AI RASA to create a simple AI powered chatbot on your own.

Build a Review Classification Model using Gated Recurrent Unit
In this Machine Learning project, you will build a classification model in python to classify the reviews of an app on a scale of 1 to 5 using Gated Recurrent Unit.

Recommender System Machine Learning Project for Beginners-1
Recommender System Machine Learning Project for Beginners - Learn how to design, implement and train a rule-based recommender system in Python

Build a Graph Based Recommendation System in Python -Part 1
Python Recommender Systems Project - Learn to build a graph based recommendation system in eCommerce to recommend products.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Build CI/CD Pipeline for Machine Learning Projects using Jenkins
In this project, you will learn how to create a CI/CD pipeline for a search engine application using Jenkins.

Build a Speech-Text Transcriptor with Nvidia Quartznet Model
In this Deep Learning Project, you will leverage transfer learning from Nvidia QuartzNet pre-trained models to develop a speech-to-text transcriptor.

Llama2 Project for MetaData Generation using FAISS and RAGs
In this LLM Llama2 Project, you will automate metadata generation using Llama2, RAGs, and AWS to reduce manual efforts.

Build a Churn Prediction Model using Ensemble Learning
Learn how to build ensemble machine learning models like Random Forest, Adaboost, and Gradient Boosting for Customer Churn Prediction using Python