Checking for collinearity among attributes of a dataset, is one of the most important steps in data preprocessing. A good way to understand the correlation among the features, is to create scatter plots for each pair of attributes.
So this recipe is a short example on How to draw a matrix of scatter plots using pandas. Let's get started.
import pandas as pd import seaborn as sb
Let's pause and look at these imports. Pandas is generally used for performing mathematical operation and preferably over arrays. Seaborn is just used in here to import dataset.
df = sb.load_dataset('tips')
Here we have imported tips dataset from seaborn library.
Now our dataset is ready.
Using scatter_matrix, we have plotted it against 3 columns.
Once we run the above code snippet, we will see:
Scroll down to the ipython file to look at the results.
We can see scatter matrix against 3 columns. Similarly, we can check for other columns to check similarity.