How to split train test data using sklearn and python?

How to split train test data using sklearn and python?

How to split train test data using sklearn and python?

This recipe helps you split train test data using sklearn and python

Recipe Objective

To train and test the data we need two different sets of data. The test set which works as completely new set of data for model and use to predict the output. But if we have a fix set of dataset provided then how to generate this test and train data.

So this is the recipe on how we can split train test data using sklearn and python.

Step 1 - Import the library

from sklearn import datasets from sklearn.model_selection import train_test_split

We have only imported pandas which is needed.

Step 2 - Setting up the Data

We have imported an inbuilt wine dataset to use test_train_split. We have stored data in X and target in y. We have aslo printed the shape of the data. wine = datasets.load_wine() X = print(X.shape) y = print(y.shape)

Step 3 - Splitting the Data

So now we are using test_train_split to split the data. We have passed test_size as 0.33 which means 33% of data will be in the test part and rest will be in train part. Parameter random_state signifies the random splitting of data into the two parts. Finally we have printed the shape of test and train data. dX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42) print(X_train.shape) print(X_test.shape) print(y_train.shape) print(y_test.shape) So the output comes

(178, 13)


(119, 13)

(59, 13)



Download Materials

Relevant Projects

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

Expedia Hotel Recommendations Data Science Project
In this data science project, you will contextualize customer data and predict the likelihood a customer will stay at 100 different hotel groups.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

NLP and Deep Learning For Fake News Classification in Python
In this project you will use Python to implement various machine learning methods( RNN, LSTM, GRU) for fake news classification.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

House Price Prediction Project using Machine Learning
Use the Zillow dataset to follow a test-driven approach and build a regression machine learning model to predict the price of the house based on other variables.