How to do string munging in Pandas?

This recipe helps you do string munging in Pandas

Recipe Objective

Have you ever tried string munging? That is selecting a part of string of making another string form the available strings in the dataframe.

So this is the recipe on how we can do string munging in Pandas.

Learn How to Build a Simple Chatbot from Scratch in Python (using NLTK)

Step 1 - Import the library

import pandas as pd import numpy as np import re as re

We have only imported pandas, numpy and re which is needed.

Step 2 - Creating DataFrame

We have created a dictionary and passed it through pd.DataFrame to create a Dataframe raw_data = {"first_name": ["Jason", "Molly", "Tina", "Jake", "Amy"], "last_name": ["Miller", "Jacobson", "Ali", "Milner", "Cooze"], "email": ["jas203@gmail.com", "momomolly@gmail.com", np.NAN, "battler@milner.com", "Ames1234@yahoo.com"]} df = pd.DataFrame(raw_data, columns = ["first_name", "last_name", "email"]) print(); print(df)

Step 3 - Applying Different Munging Operation

Lets say, first we want to check that if in feature "email" which string contains "gmail". print(df["email"].str.contains("gmail")) Lets say, we want to seperate the email into parts such that the characters before "@" becomes one string and after and before "." becomes one. At last the remaining becomes the one string. pattern = "([A-Z0-9._%+-]+)@([A-Z0-9.-]+)\.([A-Z]{2,4})" print(df["email"].str.findall(pattern, flags=re.IGNORECASE)) So the output comes as

  first_name last_name                email  preTestScore  postTestScore
0      Jason    Miller     jas203@gmail.com             4             25
1      Molly  Jacobson  momomolly@gmail.com            24             94
2       Tina       Ali                  NaN            31             57
3       Jake    Milner   battler@milner.com             2             62
4        Amy     Cooze   Ames1234@yahoo.com             3             70

0     True
1     True
2      NaN
3    False
4    False
Name: email, dtype: object

0       [(jas203, gmail, com)]
1    [(momomolly, gmail, com)]
2                          NaN
3     [(battler, milner, com)]
4     [(Ames1234, yahoo, com)]
Name: email, dtype: object

0    True
1    True
2     NaN
3    True
4    True
Name: email, dtype: object

Download Materials

What Users are saying..

profile image

Anand Kumpatla

Sr Data Scientist @ Doubleslash Software Solutions Pvt Ltd
linkedin profile url

ProjectPro is a unique platform and helps many people in the industry to solve real-life problems with a step-by-step walkthrough of projects. A platform with some fantastic resources to gain... Read More

Relevant Projects

Build Multi Class Text Classification Models with RNN and LSTM
In this Deep Learning Project, you will use the customer complaints data about consumer financial products to build multi-class text classification models using RNN and LSTM.

Learn Object Tracking (SOT, MOT) using OpenCV and Python
Get Started with Object Tracking using OpenCV and Python - Learn to implement Multiple Instance Learning Tracker (MIL) algorithm, Generic Object Tracking Using Regression Networks Tracker (GOTURN) algorithm, Kernelized Correlation Filters Tracker (KCF) algorithm, Tracking, Learning, Detection Tracker (TLD) algorithm for single and multiple object tracking from various video clips.

AWS Project to Build and Deploy LSTM Model with Sagemaker
In this AWS Sagemaker Project, you will learn to build a LSTM model on Sagemaker for sales forecasting while analyzing the impact of weather conditions on Sales.

Build an End-to-End AWS SageMaker Classification Model
MLOps on AWS SageMaker -Learn to Build an End-to-End Classification Model on SageMaker to predict a patient’s cause of death.

Deploying Machine Learning Models with Flask for Beginners
In this MLOps on GCP project you will learn to deploy a sales forecasting ML Model using Flask.

Build Real Estate Price Prediction Model with NLP and FastAPI
In this Real Estate Price Prediction Project, you will learn to build a real estate price prediction machine learning model and deploy it on Heroku using FastAPI Framework.

Loan Default Prediction Project using Explainable AI ML Models
Loan Default Prediction Project that employs sophisticated machine learning models, such as XGBoost and Random Forest and delves deep into the realm of Explainable AI, ensuring every prediction is transparent and understandable.

Recommender System Machine Learning Project for Beginners-2
Recommender System Machine Learning Project for Beginners Part 2- Learn how to build a recommender system for market basket analysis using association rule mining.

NLP Project for Multi Class Text Classification using BERT Model
In this NLP Project, you will learn how to build a multi-class text classification model using using the pre-trained BERT model.

Build a Autoregressive and Moving Average Time Series Model
In this time series project, you will learn to build Autoregressive and Moving Average Time Series Models to forecast future readings, optimize performance, and harness the power of predictive analytics for sensor data.