How to do string munging in Pandas?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to do string munging in Pandas?

How to do string munging in Pandas?

This recipe helps you do string munging in Pandas

0
In [1]:
## How to do string munging in Pandas
def Snippet_110():
    print()
    print(format('How to do string munging in Pandas','*^82'))

    import warnings
    warnings.filterwarnings("ignore")

    # load libraries
    import pandas as pd
    import numpy as np
    import re as re

    # Create dataframe
    raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
                'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'],
                'email': ['jas203@gmail.com', 'momomolly@gmail.com', np.NAN,
                          'battler@milner.com', 'Ames1234@yahoo.com'],
                'preTestScore': [4, 24, 31, 2, 3],
                'postTestScore': [25, 94, 57, 62, 70]}

    df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'email',
                                           'preTestScore', 'postTestScore'])
    print(); print(df)

    # Which strings in the email column contains ‘gmail’
    print(); print(df['email'].str.contains('gmail'))

    # Create a regular expression pattern that breaks apart emails
    pattern = '([A-Z0-9._%+-]+)@([A-Z0-9.-]+)\\.([A-Z]{2,4})'

    # Find everything in df.email that contains that pattern
    print(); print(df['email'].str.findall(pattern, flags=re.IGNORECASE))

    # Create a pandas series containing the email elements
    matches = df['email'].str.match(pattern, flags=re.IGNORECASE)
    print(); print(matches)

Snippet_110()
************************How to do string munging in Pandas************************

  first_name last_name                email  preTestScore  postTestScore
0      Jason    Miller     jas203@gmail.com             4             25
1      Molly  Jacobson  momomolly@gmail.com            24             94
2       Tina       Ali                  NaN            31             57
3       Jake    Milner   battler@milner.com             2             62
4        Amy     Cooze   Ames1234@yahoo.com             3             70

0     True
1     True
2      NaN
3    False
4    False
Name: email, dtype: object

0       [(jas203, gmail, com)]
1    [(momomolly, gmail, com)]
2                          NaN
3     [(battler, milner, com)]
4     [(Ames1234, yahoo, com)]
Name: email, dtype: object

0    True
1    True
2     NaN
3    True
4    True
Name: email, dtype: object

Relevant Projects

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.