How to do string munging in Pandas?

How to do string munging in Pandas?

This recipe helps you do string munging in Pandas

In [1]:
## How to do string munging in Pandas
def Snippet_110():
    print(format('How to do string munging in Pandas','*^82'))

    import warnings

    # load libraries
    import pandas as pd
    import numpy as np
    import re as re

    # Create dataframe
    raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
                'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'],
                'email': ['', '', np.NAN,
                          '', ''],
                'preTestScore': [4, 24, 31, 2, 3],
                'postTestScore': [25, 94, 57, 62, 70]}

    df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'email',
                                           'preTestScore', 'postTestScore'])
    print(); print(df)

    # Which strings in the email column contains ‘gmail’
    print(); print(df['email'].str.contains('gmail'))

    # Create a regular expression pattern that breaks apart emails
    pattern = '([A-Z0-9._%+-]+)@([A-Z0-9.-]+)\\.([A-Z]{2,4})'

    # Find everything in that contains that pattern
    print(); print(df['email'].str.findall(pattern, flags=re.IGNORECASE))

    # Create a pandas series containing the email elements
    matches = df['email'].str.match(pattern, flags=re.IGNORECASE)
    print(); print(matches)

************************How to do string munging in Pandas************************

  first_name last_name                email  preTestScore  postTestScore
0      Jason    Miller             4             25
1      Molly  Jacobson            24             94
2       Tina       Ali                  NaN            31             57
3       Jake    Milner             2             62
4        Amy     Cooze             3             70

0     True
1     True
2      NaN
3    False
4    False
Name: email, dtype: object

0       [(jas203, gmail, com)]
1    [(momomolly, gmail, com)]
2                          NaN
3     [(battler, milner, com)]
4     [(Ames1234, yahoo, com)]
Name: email, dtype: object

0    True
1    True
2     NaN
3    True
4    True
Name: email, dtype: object

Relevant Projects

Big Data Project German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.
Big Data Project PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.
Big Data Project Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.
Big Data Project Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.
Big Data Project Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.
Big Data Project Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.
Big Data Project Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.
Big Data Project Data Science Project-All State Insurance Claims Severity Prediction
Data science project in R to develop automated methods for predicting the cost and severity of insurance claims.
Big Data Project Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.
Big Data Project Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.