Topic modelling using Kmeans clustering to group customer reviews

Topic modelling using Kmeans clustering to group customer reviews

In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.


Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Ray Han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Shailesh Kurdekar

Solutions Architect at Capital One

I have worked for more than 15 years in Java and J2EE and have recently developed an interest in Big Data technologies and Machine learning due to a big need at my workspace. I was referred here by a... Read More

What will you learn

Introduction to Topic Modeling
Introduction to NLTK
Exploring textual data
Using regex
Cleaning textual data
Transforming unstructured data to structured data
Vectorizer - choosing between td-idf and count vectorizer
Unsupervised Machine Learning
Understanding Kmeans
Clustering tweets
Identifying optimal number of clusters
Homogeneity of data
Visualizing with word clouds
Labeling data

Project Description

Topic modelling is a method for finding a group of words (i.e. topics) from a collection of documents that best represents the information in the collection of text documents. It can also be thought of as a form of text mining - a way to obtain recurring patterns of words in textual data. The topics identified are crucial data points in helping the business figure out where to put their efforts in improving their product or services.

In this project we will use unsupervised technique - Kmeans, to cluster/ group reviews to identify main topics/ ideas in the sea of text. This will be applicable to any textual reviews. In this series, we will focus on twitter data which is more real world and more complex data compared to reviews obtained from review or survey forms.

Topic modelling provides us with methods to organize, understand and summarize large collections of textual information. It helps in:

  • Discovering hidden topical patterns that are present across the collection
  • Annotating documents according to these topics
  • Using these annotations to organize, search and summarize texts

Similar Projects

Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

In this project, we are going to predict how capable each applicant is repaying a loan.

There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

Curriculum For This Mini Project

Introduction to Topic Modeling
Introduction To Nltk
Loading And Exploring Twitter Data
Cleaning The Data With Pattern Removal
Tokenize And Identify Special Instances Of Tweets
Vectorizer part 2
Understanding Kmeans
Clustering With 8 Centroids
Clustering With 2 Centroids
Clustering With 2 Centroids - Word Clouds
Generic Function Homogeneity In Cluster - Finding the optimal Cluster Number