DATA MUNGING PYTHON PANDAS DATAFRAME PANDAS CHEATSHEET PANDAS DATAFRAME TUTORIAL

How to randomly sample a Pandas DataFrame?

This recipe helps you randomly sample a Pandas DataFrame
In [1]:
## How to randomly sample a Pandas DataFrame
def Kickstarter_Example_99():
    print()
    print(format('randomly sample a Pandas DataFrame','*^82'))
    import warnings
    warnings.filterwarnings("ignore")
    # load libraries
    import pandas as pd
    import numpy as np
    # Create dataframe
    raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
                'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'],
                'age': [42, 52, 36, 24, 73],
                'preTestScore': [4, 24, 31, 2, 3],
                'postTestScore': [25, 94, 57, 62, 70]}
    df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age',
                                           'preTestScore', 'postTestScore'])
    print(); print(df)
    # Select a random subset of 2 without replacement
    print(); print(df.take(np.random.permutation(len(df))[:2]))
    # Select a random subset of 4 without replacement
    print(); print(df.take(np.random.permutation(len(df))[:4]))
    # random sample of df    
    df1 = df.sample(3)
    print(); print(df1)
Kickstarter_Example_99()
************************randomly sample a Pandas DataFrame************************

  first_name last_name  age  preTestScore  postTestScore
0      Jason    Miller   42             4             25
1      Molly  Jacobson   52            24             94
2       Tina       Ali   36            31             57
3       Jake    Milner   24             2             62
4        Amy     Cooze   73             3             70

  first_name last_name  age  preTestScore  postTestScore
1      Molly  Jacobson   52            24             94
3       Jake    Milner   24             2             62

  first_name last_name  age  preTestScore  postTestScore
0      Jason    Miller   42             4             25
2       Tina       Ali   36            31             57
1      Molly  Jacobson   52            24             94
3       Jake    Milner   24             2             62

  first_name last_name  age  preTestScore  postTestScore
4        Amy     Cooze   73             3             70
1      Molly  Jacobson   52            24             94
2       Tina       Ali   36            31             57


Companies using this Recipe
1 employee of Renovite Technologies
1 employee of ICU Medical
1 employee of S&P Global
1 employee of Altimetrik
1 employee of KPMG
1 employee of Scotiabank
1 employee of ANAC
1 employee of LTI
1 employee of YASH Technologies
1 employee of Ericsson
1 employee of New Delhi DataPoint