How to randomly sample a Pandas DataFrame?

This recipe helps you randomly sample a Pandas DataFrame

Recipe Objective

While working on a dataset we sometimes need to randomly select fixed or random number of rows for some test. So how to select random rows.

This data science python source code does the following:
1. Creates data dictionary
2. Converts dictionary into pandas dataframe
3. Randomly selects subsets from datasample.

So this is the recipe on How we can randomly sample a Pandas DataFrame.

Step 1 - Import the library

import pandas as pd import numpy as np

We have only imported pandas and numpy which is needed.

Step 2 - Setting up the Data

We have created a dictionary of data and passed it in pd.DataFrame to make a dataframe with columns 'first_name', 'last_name', 'age', 'Comedy_Score' and 'Rating_Score'. raw_data = {'first_name': ['Sheldon', 'Raj', 'Leonard', 'Howard', 'Amy'], 'last_name': ['Copper', 'Koothrappali', 'Hofstadter', 'Wolowitz', 'Fowler'], 'age': [42, 38, 36, 41, 35], 'Comedy_Score': [9, 7, 8, 8, 5], 'Rating_Score': [25, 25, 49, 62, 70]} df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 'Comedy_Score', 'Rating_Score']) print(df)

Step 3 - Selecting random subsets

We can select random subsets of rows by df.take and passing random permutation of number from the length of df. We have done this twice for 2 and 4 samples to select. print(df.take(np.random.permutation(len(df))[:2])) print(df.take(np.random.permutation(len(df))[:4])) df1 = df.sample(3) print(df1) So the output comes as

  first_name     last_name  age  Comedy_Score  Rating_Score
0    Sheldon        Copper   42             9            25
1        Raj  Koothrappali   38             7            25
2    Leonard    Hofstadter   36             8            49
3     Howard      Wolowitz   41             8            62
4        Amy        Fowler   35             5            70

  first_name     last_name  age  Comedy_Score  Rating_Score
0    Sheldon        Copper   42             9            25
1        Raj  Koothrappali   38             7            25

  first_name     last_name  age  Comedy_Score  Rating_Score
2    Leonard    Hofstadter   36             8            49
4        Amy        Fowler   35             5            70
1        Raj  Koothrappali   38             7            25
3     Howard      Wolowitz   41             8            62

  first_name     last_name  age  Comedy_Score  Rating_Score
1        Raj  Koothrappali   38             7            25
0    Sheldon        Copper   42             9            25
3     Howard      Wolowitz   41             8            62

Download Materials

What Users are saying..

profile image

Ray han

Tech Leader | Stanford / Yale University
linkedin profile url

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Multi-Class Text Classification with Deep Learning using BERT
In this deep learning project, you will implement one of the most popular state of the art Transformer models, BERT for Multi-Class Text Classification

Build Piecewise and Spline Regression Models in Python
In this Regression Project, you will learn how to build a piecewise and spline regression model from scratch in Python to predict the points scored by a sports team.

Time Series Project to Build a Multiple Linear Regression Model
Learn to build a Multiple linear regression model in Python on Time Series Data

NLP and Deep Learning For Fake News Classification in Python
In this project you will use Python to implement various machine learning methods( RNN, LSTM, GRU) for fake news classification.

Build CNN Image Classification Models for Real Time Prediction
Image Classification Project to build a CNN model in Python that can classify images into social security cards, driving licenses, and other key identity information.

Recommender System Machine Learning Project for Beginners-2
Recommender System Machine Learning Project for Beginners Part 2- Learn how to build a recommender system for market basket analysis using association rule mining.

Insurance Pricing Forecast Using XGBoost Regressor
In this project, we are going to talk about insurance forecast by using linear and xgboost regression techniques.

NLP Project for Beginners on Text Processing and Classification
This Project Explains the Basic Text Preprocessing and How to Build a Classification Model in Python

Isolation Forest Model and LOF for Anomaly Detection in Python
Credit Card Fraud Detection Project - Build an Isolation Forest Model and Local Outlier Factor (LOF) in Python to identify fraudulent credit card transactions.

NLP Project to Build a Resume Parser in Python using Spacy
Use the popular Spacy NLP python library for OCR and text classification to build a Resume Parser in Python.