How to visualize the topic keywords in Gensim

In this recipe, we will learn how to create an LDA model followed by visualizing the topic keywords using pyLDAvis package in python.
Last Updated: 27 Jul 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING PROJECTS IN PYTHON DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective: How to visualize the topic keywords in Gensim?

First, create or load an LDA model as we did in the previous recipe by following the steps given below-

#importing required libraries import re import numpy as np import pandas as pd from pprint import pprint import gensim import gensim.corpora as corpora from gensim.utils import simple_preprocess from nltk.corpus import stopwords from gensim.models import CoherenceModel import spacy import pyLDAvis import pyLDAvis.gensim_models import matplotlib.pyplot as plt import nltk import spacy nltk.download('stopwords') nlp=spacy.load('en_core_web_sm',disable=['parser', 'ner']) #importing the Stopwords to use them stop_words = stopwords.words('english') stop_words.extend(['from', 'subject', 're', 'edu', 'use','for']) #downloading the data from sklearn.datasets import fetch_20newsgroups newsgroups_train = fetch_20newsgroups(subset='train') data = newsgroups_train.data data = [re.sub('\S*@\S*\s?', '', sent) for sent in data] data = [re.sub('\s+', ' ', sent) for sent in data] data = [re.sub("\'", "", sent) for sent in data] #cleaning the text def tokeniz(sentences): for sentence in sentences: yield(gensim.utils.simple_preprocess(str(sentence), deacc=True)) processed_data = list(tokeniz(data)) #Building Bigram & Trigram Models bigram = gensim.models.Phrases(processed_data, min_count=5, threshold=100) trigram = gensim.models.Phrases(bigram[processed_data], threshold=100) bigram_mod = gensim.models.phrases.Phraser(bigram) trigram_mod = gensim.models.phrases.Phraser(trigram) #function to filter out stopwords def remove_stopwords(texts): return [[word for word in simple_preprocess(str(doc)) if word not in stop_words] for doc in texts] #function to create bigrams def create_bigrams(texts): return [bigram_mod[doc] for doc in texts] #function to create trigrams def create_trigrams(texts): [trigram_mod[bigram_mod[doc]] for doc in texts] #function for lemmatization def lemmatize(texts, allowed_postags=['NOUN', 'ADJ', 'VERB']): texts_op = [] for sent in texts: doc = nlp(" ".join(sent)) texts_op.append([token.lemma_ for token in doc if token.pos_ in allowed_postags]) return texts_op #removing stopwords, creating bigrams and lemmatizing the text data_wo_stopwords = remove_stopwords(processed_data) data_bigrams = create_bigrams(data_wo_stopwords) data_lemmatized = lemmatize(data_bigrams, allowed_postags=[ 'NOUN', 'ADJ', 'VERB']) #printing the lemmatized data print(data_lemmatized[:3]) #creating a dictionary gensim_dictionary = corpora.Dictionary(data_lemmatized) texts = data_lemmatized #building a corpus for the topic model gensim_corpus = [gensim_dictionary.doc2bow(text) for text in texts] #printing the corpus we created above. print(gensim_corpus[:3]) #we can print the words with their frequencies. [[(gensim_dictionary[id], freq) for id, freq in cp] for cp in gensim_corpus[:4]] #creating the LDA model lda_model = gensim.models.ldamodel.LdaModel( corpus=gensim_corpus, id2word=gensim_dictionary, num_topics=20, random_state=100, update_every=1, chunksize=100, passes=10, alpha='auto', per_word_topics=True ) #visualizing the topic keywords pyLDAvis.enable_notebook() viz = pyLDAvis.gensim_models.prepare(lda_model, gensim_corpus, gensim_dictionary) viz

pyLDAvis package is used to visualize the LDA model (lda model) we constructed earlier. The bubbles on the left represent a topic, and the larger the bubble, the more frequent that topic is. It will be effective if the topic model contains large, non-overlapping bubbles dispersed around the chart.

What Users are saying..

Ameeruddin Mohammed

ETL (Abintio) developer at IBM

I come from a background in Marketing and Analytics and when I developed an interest in Machine Learning algorithms, I did multiple in-class courses from reputed institutions though I got good... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

End-to-End ML Model Monitoring using Airflow and Docker

In this MLOps Project, you will learn to build an end to end pipeline to monitor any changes in the predictive power of model or degradation of data.

View Project Details

Deep Learning Project for Beginners with Source Code Part 1

Learn to implement deep neural networks in Python .

View Project Details

Build an Image Segmentation Model using Amazon SageMaker

In this Machine Learning Project, you will learn to implement the UNet Architecture and build an Image Segmentation Model using Amazon SageMaker

View Project Details

Build a Logistic Regression Model in Python from Scratch

Regression project to implement logistic regression in python from scratch on streaming app data.

View Project Details

Multi-Class Text Classification with Deep Learning using BERT

In this deep learning project, you will implement one of the most popular state of the art Transformer models, BERT for Multi-Class Text Classification

View Project Details

Learn to Build an End-to-End Machine Learning Pipeline - Part 1

In this Machine Learning Project, you will learn how to build an end-to-end machine learning pipeline for predicting truck delays, addressing a major challenge in the logistics industry.

View Project Details

Build a Graph Based Recommendation System in Python-Part 2

In this Graph Based Recommender System Project, you will build a recommender system project for eCommerce platforms and learn to use FAISS for efficient similarity search.

View Project Details

Build Time Series Models for Gaussian Processes in Python

Time Series Project - A hands-on approach to Gaussian Processes for Time Series Modelling in Python

View Project Details

Learn to Build a Polynomial Regression Model from Scratch

In this Machine Learning Regression project, you will learn to build a polynomial regression model to predict points scored by the sports team.

View Project Details

Image Segmentation using Mask R-CNN with Tensorflow

In this Deep Learning Project on Image Segmentation Python, you will learn how to implement the Mask R-CNN model for early fire detection.

View Project Details

How to visualize the topic keywords in Gensim

Recipe Objective: How to visualize the topic keywords in Gensim?

Ameeruddin Mohammed

Relevant Projects

You might also like

Relevant Projects