How to split train test data using sklearn and python?

This recipe helps you split train test data using sklearn and python

Recipe Objective

To train and test the data we need two different sets of data. The test set which works as completely new set of data for model and use to predict the output. But if we have a fix set of dataset provided then how to generate this test and train data.

So this is the recipe on how we can split train test data using sklearn and python.

Master the Art of Data Cleaning in Machine Learning

Step 1 - Import the library

from sklearn import datasets from sklearn.model_selection import train_test_split

We have only imported pandas which is needed.

Step 2 - Setting up the Data

We have imported an inbuilt wine dataset to use test_train_split. We have stored data in X and target in y. We have aslo printed the shape of the data. wine = datasets.load_wine() X = wine.data print(X.shape) y = wine.target print(y.shape)

Step 3 - Splitting the Data

So now we are using test_train_split to split the data. We have passed test_size as 0.33 which means 33% of data will be in the test part and rest will be in train part. Parameter random_state signifies the random splitting of data into the two parts. Finally we have printed the shape of test and train data. dX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42) print(X_train.shape) print(X_test.shape) print(y_train.shape) print(y_test.shape) So the output comes

(178, 13)

(178,)

(119, 13)

(59, 13)

(119,)

(59,)

Download Materials

What Users are saying..

profile image

Anand Kumpatla

Sr Data Scientist @ Doubleslash Software Solutions Pvt Ltd
linkedin profile url

ProjectPro is a unique platform and helps many people in the industry to solve real-life problems with a step-by-step walkthrough of projects. A platform with some fantastic resources to gain... Read More

Relevant Projects

Build OCR from Scratch Python using YOLO and Tesseract
In this deep learning project, you will learn how to build your custom OCR (optical character recognition) from scratch by using Google Tesseract and YOLO to read the text from any images.

Build a Face Recognition System in Python using FaceNet
In this deep learning project, you will build your own face recognition system in Python using OpenCV and FaceNet by extracting features from an image of a person's face.

ML Model Deployment on AWS for Customer Churn Prediction
MLOps Project-Deploy Machine Learning Model to Production Python on AWS for Customer Churn Prediction

AWS MLOps Project to Deploy a Classification Model [Banking]
In this AWS MLOps project, you will learn how to deploy a classification model using Flask on AWS.

Time Series Analysis with Facebook Prophet Python and Cesium
Time Series Analysis Project - Use the Facebook Prophet and Cesium Open Source Library for Time Series Forecasting in Python

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

NLP Project to Build a Resume Parser in Python using Spacy
Use the popular Spacy NLP python library for OCR and text classification to build a Resume Parser in Python.

Build a Customer Churn Prediction Model using Decision Trees
Develop a customer churn prediction model using decision tree machine learning algorithms and data science on streaming service data.

Linear Regression Model Project in Python for Beginners Part 1
Machine Learning Linear Regression Project in Python to build a simple linear regression model and master the fundamentals of regression for beginners.

AWS MLOps Project for ARCH and GARCH Time Series Models
Build and deploy ARCH and GARCH time series forecasting models in Python on AWS .