How to do KMeans Clustering in Python?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to do KMeans Clustering in Python?

How to do KMeans Clustering in Python?

This recipe helps you do KMeans Clustering in Python

0

Recipe Objective

Have you ever tried to use Clustering by K nearest means.

So this recipe is a short example of how we we can do KMeans Clustering in Python.

Step 1 - Import the library

from sklearn import datasets from sklearn.preprocessing import StandardScaler from sklearn.cluster import KMeans import pandas as pd import seaborn as sns import matplotlib.pyplot as plt

Here we have imported various modules like datasets, KMeans and test_train_split from differnt libraries. We will understand the use of these later while using it in the in the code snipet.
For now just have a look on these imports.

Step 2 - Setup the Data for classifier

Here we have used datasets to load the inbuilt iris dataset and we have created object X and made a dataframe. We have plotted a heat map of correlation between the features. iris = datasets.load_iris() X = iris.data data = pd.DataFrame(X) cor = data.corr() fig = plt.figure(figsize=(12,10)); sns.heatmap(cor, square = True); plt.show()

Step 3 - Model and its Score

Here, First we have used standardscaler to standarise the data such that the mean becomes zero and the standard deviation becomes 1. we are using Kmeans with n_clusters equals to 3 as a Machine Learning model to fit the data. scaler = StandardScaler() X_std = scaler.fit_transform(X) clt = KMeans(n_clusters=3) model = clt.fit(X_std) Now we have predicted the output by passing X_std and the clusters. clusters = pd.DataFrame(model.fit_predict(X_std)) data["Cluster"] = clusters Here we have ploted the clusters such that data points of a cluster have the same colour. fig = plt.figure(figsize=(12,10)); ax = fig.add_subplot(111) scatter = ax.scatter(data[0],data[1], c=data["Cluster"],s=50) ax.set_title("KMeans Clustering") ax.set_xlabel("X0"); ax.set_ylabel("X1") plt.colorbar(scatter); plt.show() Output comes as:


Relevant Projects

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.