Topic modelling using Kmeans clustering to group customer reviews

Topic modelling using Kmeans clustering to group customer reviews

In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.


Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Mike Vogt

Information Architect at Bank of America

I have had a very positive experience. The platform is very rich in resources, and the expert was thoroughly knowledgeable on the subject matter - real world hands-on experience. I wish I had this... Read More

Nathan Elbert

Senior Data Scientist at Tiger Analytics

This was great. The use of Jupyter was great. Prior to learning Python I was a self taught SQL user with advanced skills. I hold a Bachelors in Finance and have 5 years of business experience.. I... Read More

What will you learn

Introduction to Topic Modeling
Introduction to NLTK
Exploring textual data
Using regex
Cleaning textual data
Transforming unstructured data to structured data
Vectorizer - choosing between td-idf and count vectorizer
Unsupervised Machine Learning
Understanding Kmeans
Clustering tweets
Identifying optimal number of clusters
Homogeneity of data
Visualizing with word clouds
Labeling data

Project Description

Topic modelling is a method for finding a group of words (i.e. topics) from a collection of documents that best represents the information in the collection of text documents. It can also be thought of as a form of text mining - a way to obtain recurring patterns of words in textual data. The topics identified are crucial data points in helping the business figure out where to put their efforts in improving their product or services.

In this project we will use unsupervised technique - Kmeans, to cluster/ group reviews to identify main topics/ ideas in the sea of text. This will be applicable to any textual reviews. In this series, we will focus on twitter data which is more real world and more complex data compared to reviews obtained from review or survey forms.

Topic modelling provides us with methods to organize, understand and summarize large collections of textual information. It helps in:

  • Discovering hidden topical patterns that are present across the collection
  • Annotating documents according to these topics
  • Using these annotations to organize, search and summarize texts

Similar Projects

There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

Build a predictive model to correctly classify products between 9 product categories (fashion, electronics, etc.) using the Otto Group dataset.

In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.

Curriculum For This Mini Project

Introduction to Topic Modeling
Introduction To Nltk
Loading And Exploring Twitter Data
Cleaning The Data With Pattern Removal
Tokenize And Identify Special Instances Of Tweets
Vectorizer part 2
Understanding Kmeans
Clustering With 8 Centroids
Clustering With 2 Centroids
Clustering With 2 Centroids - Word Clouds
Generic Function Homogeneity In Cluster - Finding the optimal Cluster Number