Explain Doc2Vec model in Gensim

In this recipe, we'll learn to create a Doc2Vec model which is used to generate a vectorized representation of a group of words taken as a whole using gensim.
Last Updated: 10 Aug 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING PROJECTS IN PYTHON DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective: Explain the Doc2Vec model in Gensim

The Doc2Vec model is used to generate a vectorized representation of a group of words taken as a whole. It calculates more than just the average of the words in the sentence. We will use the text8 dataset, which can be downloaded at gensim. downloader, to build document vectors with Doc2Vec as follows-

Learn How to use XLNet for Text Classification

#importing required libraries import gensim import gensim.downloader as api #downloading the Dataset dataset = api.load("text8") data = [d for d in dataset] #creating tagged documents using models.doc2vec.TaggedDcument() def tagged_doc(list_of_list_of_words): for i, list_of_words in enumerate(list_of_list_of_words): yield gensim.models.doc2vec.TaggedDocument(list_of_words, [i]) training_data = list(tagged_doc(data)) #printing the trained dataset print(training_data[:1]) #initialising the model dv_model = gensim.models.doc2vec.Doc2Vec(vector_size=40, min_count=2, epochs=30) #building the vocabulary dv_model.build_vocab(training_data) #training the Doc2Vec model dv_model.train(training_data, total_examples=dv_model.corpus_count, epochs=dv_model.epochs) #analysing the output print(dv_model.infer_vector(['describe', 'modern','era','revolution','repudiated']))

Output:
[-2.1419777e-01 -3.4295085e-01 -3.1674471e-01  7.9905950e-02
  1.1792209e-01 -5.5660107e-03  7.0156835e-02 -8.0916628e-02
 -3.0582789e-01 -2.4863353e-01  9.2477903e-02 -3.0935228e-02
 -5.2634442e-01 -3.7851343e-01 -7.9936698e-02  1.3879079e-01
  3.0395445e-01  4.3877283e-01 -4.4444799e-01  2.6140922e-01
 -1.3938751e-02  2.5438294e-01  6.6719547e-02  3.8132364e-01
 -1.8118909e-01 -2.3382125e-02 -3.1091588e-02 -2.3327848e-01
 -1.6785687e-01 -3.4823459e-01  9.0288207e-02 -1.7410168e-02
 -2.2582319e-01 -1.3211270e-01 -4.8467633e-01 -1.8533233e-01
  2.6937298e-02 -3.9798447e-01 -9.2203647e-02  2.9851799e-07]

By feeding a list of words to the trained model, we could infer a vector for any piece of text. The function infer vector is used to infer vector, and the cosine similarity of this vector can then be compared to other vectors.
It's worth noting that infer_vector() expects a list of string tokens, which should have already been tokenized using the words property of the original training document objects.
Because the underlying training/inference algorithms are an iterative approximation problem with inherent randomization, repeated inferences of the exact text will yield slightly different vectors.

What Users are saying..

Abhinav Agarwal

Graduate Student at Northwestern University

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Deploying Machine Learning Models with Flask for Beginners

In this MLOps on GCP project you will learn to deploy a sales forecasting ML Model using Flask.

View Project Details

Build a Customer Churn Prediction Model using Decision Trees

Develop a customer churn prediction model using decision tree machine learning algorithms and data science on streaming service data.

View Project Details

Demand prediction of driver availability using multistep time series analysis

In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.

View Project Details

Multi-Class Text Classification with Deep Learning using BERT

In this deep learning project, you will implement one of the most popular state of the art Transformer models, BERT for Multi-Class Text Classification

View Project Details

Azure Deep Learning-Deploy RNN CNN models for TimeSeries

In this Azure MLOps Project, you will learn to perform docker-based deployment of RNN and CNN Models for Time Series Forecasting on Azure Cloud.

View Project Details

House Price Prediction Project using Machine Learning in Python

Use the Zillow Zestimate Dataset to build a machine learning model for house price prediction.

View Project Details

Build a Autoregressive and Moving Average Time Series Model

In this time series project, you will learn to build Autoregressive and Moving Average Time Series Models to forecast future readings, optimize performance, and harness the power of predictive analytics for sensor data.

View Project Details

PyTorch Project to Build a LSTM Text Classification Model

In this PyTorch Project you will learn how to build an LSTM Text Classification model for Classifying the Reviews of an App .

View Project Details

Build an AI Chatbot from Scratch using Keras Sequential Model

In this NLP Project, you will learn how to build an AI Chatbot from Scratch using Keras Sequential Model.

View Project Details

Recommender System Machine Learning Project for Beginners-1

Recommender System Machine Learning Project for Beginners - Learn how to design, implement and train a rule-based recommender system in Python

View Project Details

Explain Doc2Vec model in Gensim

Recipe Objective: Explain the Doc2Vec model in Gensim

Abhinav Agarwal

Relevant Projects

You might also like

Relevant Projects