To train and test the data we need two different sets of data. The test set which works as completely new set of data for model and use to predict the output. But if we have a fix set of dataset provided then how to generate this test and train data.
So this is the recipe on how we can split train test data using sklearn and python.
from sklearn import datasets from sklearn.model_selection import train_test_split
We have only imported pandas which is needed.
We have imported an inbuilt wine dataset to use test_train_split. We have stored data in X and target in y. We have aslo printed the shape of the data.
wine = datasets.load_wine()
X = wine.data
y = wine.target
So now we are using test_train_split to split the data. We have passed test_size as 0.33 which means 33% of data will be in the test part and rest will be in train part. Parameter random_state signifies the random splitting of data into the two parts. Finally we have printed the shape of test and train data.
dX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
So the output comes
(178, 13) (178,) (119, 13) (59, 13) (119,) (59,)