How to randomly sample a Pandas DataFrame?

This recipe helps you randomly sample a Pandas DataFrame
Last Updated: 02 Jun 2022

Get access to Data Science projects View all Data Science projects

DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective

While working on a dataset we sometimes need to randomly select fixed or random number of rows for some test. So how to select random rows.

This data science python source code does the following:
1. Creates data dictionary
2. Converts dictionary into pandas dataframe
3. Randomly selects subsets from datasample.

So this is the recipe on How we can randomly sample a Pandas DataFrame.

Recipe Objective

Step 1 - Import the library

import pandas as pd import numpy as np

We have only imported pandas and numpy which is needed.

Step 2 - Setting up the Data

We have created a dictionary of data and passed it in pd.DataFrame to make a dataframe with columns 'first_name', 'last_name', 'age', 'Comedy_Score' and 'Rating_Score'. raw_data = {'first_name': ['Sheldon', 'Raj', 'Leonard', 'Howard', 'Amy'], 'last_name': ['Copper', 'Koothrappali', 'Hofstadter', 'Wolowitz', 'Fowler'], 'age': [42, 38, 36, 41, 35], 'Comedy_Score': [9, 7, 8, 8, 5], 'Rating_Score': [25, 25, 49, 62, 70]} df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 'Comedy_Score', 'Rating_Score']) print(df)

Step 3 - Selecting random subsets

We can select random subsets of rows by df.take and passing random permutation of number from the length of df. We have done this twice for 2 and 4 samples to select. print(df.take(np.random.permutation(len(df))[:2])) print(df.take(np.random.permutation(len(df))[:4])) df1 = df.sample(3) print(df1) So the output comes as

  first_name     last_name  age  Comedy_Score  Rating_Score
0    Sheldon        Copper   42             9            25
1        Raj  Koothrappali   38             7            25
2    Leonard    Hofstadter   36             8            49
3     Howard      Wolowitz   41             8            62
4        Amy        Fowler   35             5            70

  first_name     last_name  age  Comedy_Score  Rating_Score
0    Sheldon        Copper   42             9            25
1        Raj  Koothrappali   38             7            25

  first_name     last_name  age  Comedy_Score  Rating_Score
2    Leonard    Hofstadter   36             8            49
4        Amy        Fowler   35             5            70
1        Raj  Koothrappali   38             7            25
3     Howard      Wolowitz   41             8            62

  first_name     last_name  age  Comedy_Score  Rating_Score
1        Raj  Koothrappali   38             7            25
0    Sheldon        Copper   42             9            25
3     Howard      Wolowitz   41             8            62

Download Materials

iPython Notebook

What Users are saying..

Ed Godalle

Director Data Analytics at EY / EY Tech

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Hands-On Approach to Master PyTorch Tensors with Examples

In this deep learning project, you will learn how to perform various operations on the building block of PyTorch : Tensors.

View Project Details

Build an Image Segmentation Model using Amazon SageMaker

In this Machine Learning Project, you will learn to implement the UNet Architecture and build an Image Segmentation Model using Amazon SageMaker

View Project Details

Multi-Class Text Classification with Deep Learning using BERT

In this deep learning project, you will implement one of the most popular state of the art Transformer models, BERT for Multi-Class Text Classification

View Project Details

End-to-End Snowflake Healthcare Analytics Project on AWS-2

In this AWS Snowflake project, you will build an end to end retraining pipeline by checking Data and Model Drift and learn how to redeploy the model if needed

View Project Details

Build OCR from Scratch Python using YOLO and Tesseract

In this deep learning project, you will learn how to build your custom OCR (optical character recognition) from scratch by using Google Tesseract and YOLO to read the text from any images.

View Project Details

MLOps AWS Project on Topic Modeling using Gunicorn Flask

In this project we will see the end-to-end machine learning development process to design, build and manage reproducible, testable, and evolvable machine learning models by using AWS

View Project Details

Deep Learning Project for Beginners with Source Code Part 1

Learn to implement deep neural networks in Python .

View Project Details

Predict Churn for a Telecom company using Logistic Regression

Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

View Project Details

Personalized Medicine: Redefining Cancer Treatment

In this Personalized Medicine Machine Learning Project you will learn to classify genetic mutations on the basis of medical literature into 9 classes.

View Project Details

MLOps Project to Build Search Relevancy Algorithm with SBERT

In this MLOps SBERT project you will learn to build and deploy an accurate and scalable search algorithm on AWS using SBERT and ANNOY to enhance search relevancy in news articles.

View Project Details

How to randomly sample a Pandas DataFrame?

Recipe Objective

Table of Contents

Step 1 - Import the library

Step 2 - Setting up the Data

Step 3 - Selecting random subsets

Ed Godalle

Relevant Projects

You might also like

Relevant Projects