How to randomly sample a Pandas DataFrame?

This recipe helps you randomly sample a Pandas DataFrame

Recipe Objective

While working on a dataset we sometimes need to randomly select fixed or random number of rows for some test. So how to select random rows.

This data science python source code does the following:
1. Creates data dictionary
2. Converts dictionary into pandas dataframe
3. Randomly selects subsets from datasample.

So this is the recipe on How we can randomly sample a Pandas DataFrame.

Step 1 - Import the library

import pandas as pd import numpy as np

We have only imported pandas and numpy which is needed.

Step 2 - Setting up the Data

We have created a dictionary of data and passed it in pd.DataFrame to make a dataframe with columns 'first_name', 'last_name', 'age', 'Comedy_Score' and 'Rating_Score'. raw_data = {'first_name': ['Sheldon', 'Raj', 'Leonard', 'Howard', 'Amy'], 'last_name': ['Copper', 'Koothrappali', 'Hofstadter', 'Wolowitz', 'Fowler'], 'age': [42, 38, 36, 41, 35], 'Comedy_Score': [9, 7, 8, 8, 5], 'Rating_Score': [25, 25, 49, 62, 70]} df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 'Comedy_Score', 'Rating_Score']) print(df)

Step 3 - Selecting random subsets

We can select random subsets of rows by df.take and passing random permutation of number from the length of df. We have done this twice for 2 and 4 samples to select. print(df.take(np.random.permutation(len(df))[:2])) print(df.take(np.random.permutation(len(df))[:4])) df1 = df.sample(3) print(df1) So the output comes as

  first_name     last_name  age  Comedy_Score  Rating_Score
0    Sheldon        Copper   42             9            25
1        Raj  Koothrappali   38             7            25
2    Leonard    Hofstadter   36             8            49
3     Howard      Wolowitz   41             8            62
4        Amy        Fowler   35             5            70

  first_name     last_name  age  Comedy_Score  Rating_Score
0    Sheldon        Copper   42             9            25
1        Raj  Koothrappali   38             7            25

  first_name     last_name  age  Comedy_Score  Rating_Score
2    Leonard    Hofstadter   36             8            49
4        Amy        Fowler   35             5            70
1        Raj  Koothrappali   38             7            25
3     Howard      Wolowitz   41             8            62

  first_name     last_name  age  Comedy_Score  Rating_Score
1        Raj  Koothrappali   38             7            25
0    Sheldon        Copper   42             9            25
3     Howard      Wolowitz   41             8            62

Download Materials

What Users are saying..

profile image

Ed Godalle

Director Data Analytics at EY / EY Tech
linkedin profile url

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More

Relevant Projects

Hands-On Approach to Master PyTorch Tensors with Examples
In this deep learning project, you will learn how to perform various operations on the building block of PyTorch : Tensors.

Build an Image Segmentation Model using Amazon SageMaker
In this Machine Learning Project, you will learn to implement the UNet Architecture and build an Image Segmentation Model using Amazon SageMaker

Multi-Class Text Classification with Deep Learning using BERT
In this deep learning project, you will implement one of the most popular state of the art Transformer models, BERT for Multi-Class Text Classification

End-to-End Snowflake Healthcare Analytics Project on AWS-2
In this AWS Snowflake project, you will build an end to end retraining pipeline by checking Data and Model Drift and learn how to redeploy the model if needed

Build OCR from Scratch Python using YOLO and Tesseract
In this deep learning project, you will learn how to build your custom OCR (optical character recognition) from scratch by using Google Tesseract and YOLO to read the text from any images.

MLOps AWS Project on Topic Modeling using Gunicorn Flask
In this project we will see the end-to-end machine learning development process to design, build and manage reproducible, testable, and evolvable machine learning models by using AWS

Deep Learning Project for Beginners with Source Code Part 1
Learn to implement deep neural networks in Python .

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Personalized Medicine: Redefining Cancer Treatment
In this Personalized Medicine Machine Learning Project you will learn to classify genetic mutations on the basis of medical literature into 9 classes.

MLOps Project to Build Search Relevancy Algorithm with SBERT
In this MLOps SBERT project you will learn to build and deploy an accurate and scalable search algorithm on AWS using SBERT and ANNOY to enhance search relevancy in news articles.