This recipe helps you find outliers in Python
Do you know few values in dataset are considered as outliers, there are the data values which donot comes in the range of data i.e. some values that is very small or large. They effect the model very badly so we need to remove the outlier.
So this is the recipe on we can find outliers in Python.
Build a Chatbot in Python from Scratch!
from sklearn.covariance import EllipticEnvelope
from sklearn.datasets import make_blobs
We have imported EllipticEnvelop and make_blobs which is needed.
We have created a dataset using make_blobs and we will remove outliers from this.
X, _ = make_blobs(n_samples = 100,
n_features = 20,
centers = 7,
cluster_std = 1.1,
shuffle = True,
random_state = 42)
We are training the EllipticEnvelope with parameter contamination which signifies the amount of data that is to be removed as outiers. We have predicted the output that is the data without outliers.
outlier_detector = EllipticEnvelope(contamination=.1)
outlier_detector.fit(X)
print(X)
print(outlier_detector.predict(X))
So the output comes as
[[ 4.93252797 7.68541287 -3.97876821 ... 4.52684633 -3.24863123
9.41974416]
[-9.3234536 4.59276437 -4.39779468 ... -7.09597087 8.20227193
2.26134033]
[-8.7338198 3.08658417 -3.49905765 ... -6.82385124 8.775862
1.38825176]
...
[-2.83969517 -6.07980264 6.47763993 ... -9.36607752 -2.57352093
-9.39410402]
[-2.1671993 10.63717797 5.58330442 ... 0.50898027 -1.25365592
-5.02572796]
[ 7.21074034 9.28156979 -3.54240715 ... 3.89782083 -3.2259812
11.03335594]]
[ 1 -1 1 -1 1 1 -1 1 1 1 1 1 1 -1 1 1 1 1 1 1 1 1 1 1
1 1 -1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 1 1 1 1 1 1 1 1
1 1 -1 1 1 1 1 1 1 1 1 -1 1 1 1 1 1 -1 1 1 1 1 1 1
1 1 1 1]
Download Materials