DATA MUNGING
# How to deal with imbalance classes with downsampling in Python?

# How to deal with imbalance classes with downsampling in Python?

This recipe helps you deal with imbalance classes with downsampling in Python

This data science python source code does the following: 1. Imports necessary libraries and iris data from sklearn dataset 2. Use of "where" function for data handling 3. Downsamples the higher class to balance the data

In [1]:

```
## How to deal with imbalance classes with downsampling in Python
def Kickstarter_Example_32():
print()
print(format('How to deal with imbalance classes with downsampling in Python', '*^82'))
import warnings
warnings.filterwarnings("ignore")
# Load libraries
import numpy as np
from sklearn.datasets import load_iris
# Load iris data
iris = load_iris()
# Create feature matrix
X = iris.data
# Create target vector
y = iris.target
# Make Iris Dataset Imbalanced # Remove first 40 observations
X = X[40:,:]
y = y[40:]
# Create binary target vector indicating if class 0
y = np.where((y == 0), 0, 1)
# Look at the imbalanced target vector
print(); print("Look at the imbalanced target vector:\n", y)
# Downsample Majority Class To Match Minority Class
# Indicies of each class' observations
i_class0 = np.where(y == 0)[0]
i_class1 = np.where(y == 1)[0]
# Number of observations in each class
n_class0 = len(i_class0); print(); print("n_class0: ", n_class0)
n_class1 = len(i_class1); print(); print("n_class1: ", n_class1)
# For every observation of class 0, randomly sample from class 1 without replacement
i_class1_downsampled = np.random.choice(i_class1, size=n_class0, replace=False)
# Join together class 0's target vector with the downsampled class 1's target vector
print(); print(np.hstack((y[i_class0], y[i_class1_downsampled])))
Kickstarter_Example_32()
```

In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.