Topic modelling using Kmeans clustering to group customer reviews

Topic modelling using Kmeans clustering to group customer reviews

In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.
explanation image


Each project comes with 2-5 hours of micro-videos explaining the solution.

ipython image

Code & Dataset

Get access to 102+ solved projects with iPython notebooks and datasets.

project experience

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

What will you learn

Introduction to Topic Modeling
Introduction to NLTK
Exploring textual data
Using regex
Cleaning textual data
Transforming unstructured data to structured data
Vectorizer - choosing between td-idf and count vectorizer
Unsupervised Machine Learning
Understanding Kmeans
Clustering tweets
Identifying optimal number of clusters
Homogeneity of data
Visualizing with word clouds
Labeling data

Project Description

Topic modelling is a method for finding a group of words (i.e. topics) from a collection of documents that best represents the information in the collection of text documents. It can also be thought of as a form of text mining - a way to obtain recurring patterns of words in textual data. The topics identified are crucial data points in helping the business figure out where to put their efforts in improving their product or services.

In this project we will use unsupervised technique - Kmeans, to cluster/ group reviews to identify main topics/ ideas in the sea of text. This will be applicable to any textual reviews. In this series, we will focus on twitter data which is more real world and more complex data compared to reviews obtained from review or survey forms.

Topic modelling provides us with methods to organize, understand and summarize large collections of textual information. It helps in:

  • Discovering hidden topical patterns that are present across the collection
  • Annotating documents according to these topics
  • Using these annotations to organize, search and summarize texts

New Projects

Curriculum For This Mini Project

Introduction to Topic Modeling
Introduction To Nltk
Loading And Exploring Twitter Data
Cleaning The Data With Pattern Removal
Tokenize And Identify Special Instances Of Tweets
Vectorizer part 2
Understanding Kmeans
Clustering With 8 Centroids
Clustering With 2 Centroids
Clustering With 2 Centroids - Word Clouds
Generic Function Homogeneity In Cluster - Finding the optimal Cluster Number

Latest Blogs