How to do string munging in Pandas?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to do string munging in Pandas?

How to do string munging in Pandas?

This recipe helps you do string munging in Pandas

0

Recipe Objective

Have you ever tried string munging? That is selecting a part of string of making another string form the available strings in the dataframe.

So this is the recipe on how we can do string munging in Pandas.

Step 1 - Import the library

import pandas as pd import numpy as np import re as re

We have only imported pandas, numpy and re which is needed.

Step 2 - Creating DataFrame

We have created a dictionary and passed it through pd.DataFrame to create a Dataframe raw_data = {"first_name": ["Jason", "Molly", "Tina", "Jake", "Amy"], "last_name": ["Miller", "Jacobson", "Ali", "Milner", "Cooze"], "email": ["jas203@gmail.com", "momomolly@gmail.com", np.NAN, "battler@milner.com", "Ames1234@yahoo.com"]} df = pd.DataFrame(raw_data, columns = ["first_name", "last_name", "email"]) print(); print(df)

Step 3 - Applying Different Munging Operation

Lets say, first we want to check that if in feature "email" which string contains "gmail". print(df["email"].str.contains("gmail")) Lets say, we want to seperate the email into parts such that the characters before "@" becomes one string and after and before "." becomes one. At last the remaining becomes the one string. pattern = "([A-Z0-9._%+-]+)@([A-Z0-9.-]+)\.([A-Z]{2,4})" print(df["email"].str.findall(pattern, flags=re.IGNORECASE)) So the output comes as

  first_name last_name                email  preTestScore  postTestScore
0      Jason    Miller     jas203@gmail.com             4             25
1      Molly  Jacobson  momomolly@gmail.com            24             94
2       Tina       Ali                  NaN            31             57
3       Jake    Milner   battler@milner.com             2             62
4        Amy     Cooze   Ames1234@yahoo.com             3             70

0     True
1     True
2      NaN
3    False
4    False
Name: email, dtype: object

0       [(jas203, gmail, com)]
1    [(momomolly, gmail, com)]
2                          NaN
3     [(battler, milner, com)]
4     [(Ames1234, yahoo, com)]
Name: email, dtype: object

0    True
1    True
2     NaN
3    True
4    True
Name: email, dtype: object

Relevant Projects

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.