How to view topics in LDA topic model in Gensim

In this recipe, we will first create an LDA model using the gensim library in python and then learn the steps to view the topics in the model.
Last Updated: 27 Jul 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING PROJECTS IN PYTHON DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective: How to view topics in the LDA topic model in Gensim?

First, create or load an LDA model as we did in the previous recipe by following the steps given below-

#importing required libraries import re import numpy as np import pandas as pd from pprint import pprint import gensim import gensim.corpora as corpora from gensim.utils import simple_preprocess from nltk.corpus import stopwords from gensim.models import CoherenceModel import spacy import pyLDAvis import pyLDAvis.gensim_models import matplotlib.pyplot as plt import nltk import spacy nltk.download('stopwords') nlp=spacy.load('en_core_web_sm',disable=['parser', 'ner']) #importing the Stopwords to use them stop_words = stopwords.words('english') stop_words.extend(['from', 'subject', 're', 'edu', 'use','for']) #downloading the data from sklearn.datasets import fetch_20newsgroups newsgroups_train = fetch_20newsgroups(subset='train') data = newsgroups_train.data data = [re.sub('\S*@\S*\s?', '', sent) for sent in data] data = [re.sub('\s+', ' ', sent) for sent in data] data = [re.sub("\'", "", sent) for sent in data] #cleaning the text def tokeniz(sentences): for sentence in sentences: yield(gensim.utils.simple_preprocess(str(sentence), deacc=True)) processed_data = list(tokeniz(data)) #Building Bigram & Trigram Models bigram = gensim.models.Phrases(processed_data, min_count=5, threshold=100) trigram = gensim.models.Phrases(bigram[processed_data], threshold=100) bigram_mod = gensim.models.phrases.Phraser(bigram) trigram_mod = gensim.models.phrases.Phraser(trigram) #function to filter out stopwords def remove_stopwords(texts): return [[word for word in simple_preprocess(str(doc)) if word not in stop_words] for doc in texts] #function to create bigrams def create_bigrams(texts): return [bigram_mod[doc] for doc in texts] #function to create trigrams def create_trigrams(texts): [trigram_mod[bigram_mod[doc]] for doc in texts] #function for lemmatization def lemmatize(texts, allowed_postags=['NOUN', 'ADJ', 'VERB']): texts_op = [] for sent in texts: doc = nlp(" ".join(sent)) texts_op.append([token.lemma_ for token in doc if token.pos_ in allowed_postags]) return texts_op #removing stopwords, creating bigrams and lemmatizing the text data_wo_stopwords = remove_stopwords(processed_data) data_bigrams = create_bigrams(data_wo_stopwords) data_lemmatized = lemmatize(data_bigrams, allowed_postags=[ 'NOUN', 'ADJ', 'VERB']) #printing the lemmatized data print(data_lemmatized[:3]) #creating a dictionary gensim_dictionary = corpora.Dictionary(data_lemmatized) texts = data_lemmatized #building a corpus for the topic model gensim_corpus = [gensim_dictionary.doc2bow(text) for text in texts] #printing the corpus we created above. print(gensim_corpus[:3]) #we can print the words with their frequencies. [[(gensim_dictionary[id], freq) for id, freq in cp] for cp in gensim_corpus[:4]] #creating the LDA model lda_model = gensim.models.ldamodel.LdaModel( corpus=gensim_corpus, id2word=gensim_dictionary, num_topics=20, random_state=100, update_every=1, chunksize=100, passes=10, alpha='auto', per_word_topics=True ) #viewing topics pprint(lda_model.print_topics())

Output:
[(0,
  '0.017*"year" + 0.017*"new" + 0.015*"make" + 0.011*"work" + 0.011*"number" + '
  '0.010*"will" + 0.010*"use" + 0.010*"may" + 0.009*"high" + 0.009*"large"'),
 (1,
  '0.047*"line" + 0.046*"would" + 0.042*"write" + 0.027*"article" + '
  '0.025*"know" + 0.024*"be" + 0.024*"go" + 0.022*"get" + 0.020*"think" + '
  '0.018*"good"'),
 (2,
  '0.038*"man" + 0.016*"straight" + 0.015*"male" + 0.015*"homosexual" + '
  '0.014*"sex" + 0.014*"marriage" + 0.013*"helmet" + 0.013*"gay" + '
  '0.013*"mirror" + 0.012*"creation"'),
 (3,
  '0.030*"mail" + 0.029*"include" + 0.027*"send" + 0.023*"post" + 0.020*"list" '
  '+ 0.020*"source" + 0.018*"information" + 0.017*"address" + 0.016*"email" + '
  '0.015*"book"'),
 (4,
  '0.072*"car" + 0.024*"drug" + 0.023*"distribution_usa" + 0.020*"drive" + '
  '0.019*"model" + 0.018*"engine" + 0.013*"insist" + 0.012*"road" + '
  '0.012*"dealer" + 0.012*"buy"'),
 (5,
  '0.032*"power" + 0.023*"light" + 0.018*"cut" + 0.017*"lebanese" + '
  '0.015*"notice" + 0.014*"bus" + 0.012*"route" + 0.011*"cool" + '
  '0.011*"external" + 0.010*"master"'),
 (6,
  '0.025*"kill" + 0.016*"people" + 0.015*"child" + 0.014*"attack" + '
  '0.013*"say" + 0.013*"death" + 0.013*"war" + 0.012*"soldier" + '
  '0.010*"murder" + 0.010*"village"'),
 (7,
  '0.129*"ax" + 0.109*"max" + 0.051*"bike" + 0.025*"di_di" + 0.019*"ride" + '
  '0.018*"rider" + 0.016*"dog" + 0.008*"biker" + 0.008*"cub" + 0.007*"dare"'),
 (8,
  '0.049*"report" + 0.024*"slave" + 0.020*"brain" + 0.016*"mount" + '
  '0.015*"medium" + 0.014*"laugh" + 0.014*"reference" + 0.014*"beat" + '
  '0.012*"tumor" + 0.012*"mine"'),
 (9,
  '0.038*"key" + 0.016*"system" + 0.015*"use" + 0.014*"test" + 0.014*"entry" + '
  '0.013*"technology" + 0.012*"public" + 0.011*"provide" + 0.011*"encryption" '
  '+ 0.011*"phone"'),
 (10,
  '0.022*"say" + 0.022*"believe" + 0.021*"faith" + 0.016*"religion" + '
  '0.014*"people" + 0.013*"truth" + 0.013*"atheist" + 0.012*"belief" + '
  '0.010*"church" + 0.010*"man"'),
 (11,
  '0.056*"game" + 0.041*"year" + 0.038*"team" + 0.031*"play" + 0.031*"player" '
  '+ 0.017*"run" + 0.017*"field" + 0.017*"score" + 0.016*"division" + '
  '0.014*"last"'),
 (12,
  '0.038*"people" + 0.024*"state" + 0.019*"right" + 0.017*"law" + 0.014*"gun" '
  '+ 0.012*"government" + 0.011*"would" + 0.011*"case" + 0.010*"person" + '
  '0.010*"god"'),
 (13,
  '0.044*"space" + 0.027*"speed" + 0.020*"device" + 0.017*"scsi" + '
  '0.016*"design" + 0.016*"performance" + 0.015*"launch" + 0.014*"compare" + '
  '0.014*"datum" + 0.012*"orbit"'),
 (14,
  '0.022*"reason" + 0.021*"evidence" + 0.018*"may" + 0.014*"point" + '
  '0.014*"claim" + 0.013*"sense" + 0.012*"exist" + 0.010*"question" + '
  '0.010*"make" + 0.009*"must"'),
 (15,
  '0.037*"file" + 0.034*"program" + 0.034*"window" + 0.021*"use" + '
  '0.020*"image" + 0.019*"set" + 0.018*"problem" + 0.015*"version" + '
  '0.015*"solution" + 0.015*"screen"'),
 (16,
  '0.027*"team" + 0.023*"season" + 0.021*"fan" + 0.017*"wing" + 0.016*"trade" '
  '+ 0.015*"box" + 0.014*"playoff" + 0.013*"play" + 0.012*"pen" + 0.012*"cop"'),
 (17,
  '0.027*"pin" + 0.026*"israeli" + 0.025*"suggest" + 0.017*"period" + '
  '0.015*"lead" + 0.015*"greek" + 0.013*"peace" + 0.013*"pro" + '
  '0.012*"examine" + 0.012*"position"'),
 (18,
  '0.035*"sale" + 0.018*"item" + 0.016*"food" + 0.014*"research" + '
  '0.013*"doctor" + 0.013*"cd" + 0.013*"diagnosis" + 0.012*"pain" + '
  '0.011*"treatment" + 0.011*"body"'),
 (19,
  '0.042*"drive" + 0.033*"system" + 0.028*"card" + 0.022*"software" + '
  '0.021*"thank" + 0.020*"computer" + 0.019*"use" + 0.019*"machine" + '
  '0.018*"bit" + 0.015*"color"')]

The LDA model (lda_model) we have created above is used to view the topics from the documents.

What Users are saying..

Ray han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Learn to Build a Siamese Neural Network for Image Similarity

In this Deep Learning Project, you will learn how to build a siamese neural network with Keras and Tensorflow for Image Similarity.

View Project Details

Azure Deep Learning-Deploy RNN CNN models for TimeSeries

In this Azure MLOps Project, you will learn to perform docker-based deployment of RNN and CNN Models for Time Series Forecasting on Azure Cloud.

View Project Details

Loan Eligibility Prediction in Python using H2O.ai

In this loan prediction project you will build predictive models in Python using H2O.ai to predict if an applicant is able to repay the loan or not.

View Project Details

Isolation Forest Model and LOF for Anomaly Detection in Python

Credit Card Fraud Detection Project - Build an Isolation Forest Model and Local Outlier Factor (LOF) in Python to identify fraudulent credit card transactions.

View Project Details

Learn How to Build PyTorch Neural Networks from Scratch

In this deep learning project, you will learn how to build PyTorch neural networks from scratch.

View Project Details

Build Classification Algorithms for Digital Transformation[Banking]

Implement a machine learning approach using various classification techniques in Python to examine the digitalisation process of bank customers.

View Project Details

Avocado Machine Learning Project Python for Price Prediction

In this ML Project, you will use the Avocado dataset to build a machine learning model to predict the average price of avocado which is continuous in nature based on region and varieties of avocado.

View Project Details

Recommender System Machine Learning Project for Beginners-3

Content Based Recommender System Project - Building a Content-Based Product Recommender App with Streamlit

View Project Details

Build Multi Class Text Classification Models with RNN and LSTM

In this Deep Learning Project, you will use the customer complaints data about consumer financial products to build multi-class text classification models using RNN and LSTM.

View Project Details

Build Regression (Linear,Ridge,Lasso) Models in NumPy Python

In this machine learning regression project, you will learn to build NumPy Regression Models (Linear Regression, Ridge Regression, Lasso Regression) from Scratch.

View Project Details

How to view topics in LDA topic model in Gensim

Recipe Objective: How to view topics in the LDA topic model in Gensim?

Ray han

Relevant Projects

You might also like

Relevant Projects