How to create a dictionary from a list of sentences using Gensim

In this recipe, we will learn how to create a dictionary when we have a list of several sentences and we must convert each sentence to a list of words.

Recipe Objective: How to create a dictionary from a list of sentences using Gensim?

We will make a dictionary out of a list of sentences in this example. When we have a list of sentences or several sentences, we must convert each sentence to a list of words, and comprehensions are one of the most frequent methods. Let us check the code for the same.

              Build a Chatbot in Python from Scratch!

#importing required libraries
import gensim
from gensim import corpora
from pprint import pprint

#creating a sample corpus for demonstration purpose
txt_corpus = ["This is sample document",
             "Collection of documents make a corpus",
             "You can vectorize your corpus for a mathematically convenient representation of a document"]
#tokenisation
tokens = [[token for token in sentence.split()] for sentence in txt_corpus]

#creating a dictionary
gensim_dictionary = corpora.Dictionary(tokens)

#displaying contents of the dictionary
print("The dictionary has: " +str(len(gensim_dictionary)) + " tokens")
for k, v in gensim_dictionary.token2id.items():
    print(f'{k:{15}} {v:{10}}')

Output:
The dictionary has: 18 tokens
This                     0
document                 1
is                       2
sample                   3
Collection               4
a                        5
corpus                   6
documents                7
make                     8
of                       9
You                     10
can                     11
convenient              12
for                     13
mathematically          14
representation          15
vectorize               16
your                    17

This is how we create a dictionary from a list of sentences using Gensim.

What Users are saying..

profile image

Ed Godalle

Director Data Analytics at EY / EY Tech
linkedin profile url

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More

Relevant Projects

Model Deployment on GCP using Streamlit for Resume Parsing
Perform model deployment on GCP for resume parsing model using Streamlit App.

PyCaret Project to Build and Deploy an ML App using Streamlit
In this PyCaret Project, you will build a customer segmentation model with PyCaret and deploy the machine learning application using Streamlit.

Build a Multi ClassText Classification Model using Naive Bayes
Implement the Naive Bayes Algorithm to build a multi class text classification model in Python.

Build Multi Class Text Classification Models with RNN and LSTM
In this Deep Learning Project, you will use the customer complaints data about consumer financial products to build multi-class text classification models using RNN and LSTM.

Build a Graph Based Recommendation System in Python -Part 1
Python Recommender Systems Project - Learn to build a graph based recommendation system in eCommerce to recommend products.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

MLOps Project on GCP using Kubeflow for Model Deployment
MLOps using Kubeflow on GCP - Build and deploy a deep learning model on Google Cloud Platform using Kubeflow pipelines in Python

Build a Multi Class Image Classification Model Python using CNN
This project explains How to build a Sequential Model that can perform Multi Class Image Classification in Python using CNN

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

End-to-End ML Model Monitoring using Airflow and Docker
In this MLOps Project, you will learn to build an end to end pipeline to monitor any changes in the predictive power of model or degradation of data.