How to split train test data using sklearn and python?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to split train test data using sklearn and python?

How to split train test data using sklearn and python?

This recipe helps you split train test data using sklearn and python

3

Recipe Objective

To train and test the data we need two different sets of data. The test set which works as completely new set of data for model and use to predict the output. But if we have a fix set of dataset provided then how to generate this test and train data.

So this is the recipe on how we can split train test data using sklearn and python.

Step 1 - Import the library

from sklearn import datasets from sklearn.model_selection import train_test_split

We have only imported pandas which is needed.

Step 2 - Setting up the Data

We have imported an inbuilt wine dataset to use test_train_split. We have stored data in X and target in y. We have aslo printed the shape of the data. wine = datasets.load_wine() X = wine.data print(X.shape) y = wine.target print(y.shape)

Step 3 - Splitting the Data

So now we are using test_train_split to split the data. We have passed test_size as 0.33 which means 33% of data will be in the test part and rest will be in train part. Parameter random_state signifies the random splitting of data into the two parts. Finally we have printed the shape of test and train data. dX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42) print(X_train.shape) print(X_test.shape) print(y_train.shape) print(y_test.shape) So the output comes


(178, 13)

(178,)

(119, 13)

(59, 13)

(119,)

(59,)

Relevant Projects

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Human Activity Recognition Using Smartphones Data Set
In this deep learning project, you will build a classification system where to precisely identify human fitness activities.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.