make_blob() function for clustering

Gagan Preet

The trouble as I understand is with the way you are trying to plot. Here is a code snippet that might help you. Go throgh it and if you need any clarity, then I can explain it.

from sklearn.datasets import make_blobs
import seaborn as sns
import pandas as pd
from itertools import combinations
import matplotlib.pyplot as plt
%matplotlib inline

n_features = 5
n_samples = 500
n_centers = 4
X, y = make_blobs(n_samples, n_features, n_centers)
data = pd.DataFrame(X, columns=['Feature {:2d}'.format(i)
                                for i in range(n_features)])
data['cluster'] = y
for i, j in combinations(range(n_features), 2):
  print('Plotting Feature {:2d} vs Feature {:2d}'.format(i, j))
  sns.lmplot(x='Feature {:2d}'.format(i),
             y='Feature {:2d}'.format(j),
             hue='cluster',
             data=data,
             fit_reg=False
            )
  plt.show()

Oct 31 2018 12:45 PM

Khushbakht

Thanks GPS. I understand it. In above example we are plotting each feature against other feature and using y-value as different color.

I was trying to replicate our class example where we did below plotting x against y :

import seaborn as sns
sns.regplot(clustering_data[0][:,0], clustering_data[0][:,1], fit_reg=False)

But i guess that worked because we only had two features. but in my example i have 5 features.

So it is correct to asume that :

if we have two variables one x and other y ; then we plot them against each other (as we did in class)
if we have multiple features and one target variable, then we plot features against ecah other and use y as color

What I don't understand is how do we decide which plottng method to use : regplot or lmplot ?

Best,

KB

Nov 01 2018 07:10 AM

Gagan Preet

Short answer is:

Plots are 2 dimensional. So only 2 features can be used at a time. So we use multiple You can use color and size of bullet to add some more values but they can become difficult to understand. This is why we use scatter matrix to understand relationship between all the features and target.

regplot vs lmplot: These perform very similar purpose and its ok to use any one of them. lmplot has the advantage that it lets you use the hue (or color) aspect for distinguishing between classes.

Nov 01 2018 12:52 PM

Khushbakht

Okay. This is helpful. Thanks GPS!

Nov 02 2018 05:36 AM

make_blob() function for clustering

4 Answer(s)

Relevant Projects

You might also like

Related Blogs