Topic modelling using Kmeans clustering to group customer reviews

In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.


What will you learn

Introduction to Topic Modeling
Introduction to NLTK
Exploring textual data
Using regex
Cleaning textual data
Transforming unstructured data to structured data
Vectorizer - choosing between td-idf and count vectorizer
Unsupervised Machine Learning
Understanding Kmeans
Clustering tweets
Identifying optimal number of clusters
Homogeneity of data
Visualizing with word clouds
Labeling data

Project Description

Topic modelling is a method for finding a group of words (i.e. topics) from a collection of documents that best represents the information in the collection of text documents. It can also be thought of as a form of text mining - a way to obtain recurring patterns of words in textual data. The topics identified are crucial data points in helping the business figure out where to put their efforts in improving their product or services.

In this project we will use unsupervised technique - Kmeans, to cluster/ group reviews to identify main topics/ ideas in the sea of text. This will be applicable to any textual reviews. In this series, we will focus on twitter data which is more real world and more complex data compared to reviews obtained from review or survey forms.

Topic modelling provides us with methods to organize, understand and summarize large collections of textual information. It helps in:

  • Discovering hidden topical patterns that are present across the collection
  • Annotating documents according to these topics
  • Using these annotations to organize, search and summarize texts

Curriculum For This Mini Project

