How to scrape table from webpage using beautiful soup

This recipe helps you scrape a table from the web page which contains the data and will convert it into the data frame using pandas and beautiful soup.

Recipe Objective - How to scrap a table from the web page using beautiful soup?

Required Libraries:-

  1. Beautiful Soup (bs4) - Beautiful Soup (bs4) is a Python web scraping library for pulling the data from HTML and XML files. 
  2. pandas - pandas is a Python library that provides fast and flexible design to work with "relational" or "labeled" data. It has a very powerful fundamental for doing exploratory data analysis and data manipulation for real-world data in python.  

In order to scrap the table, we will use find(), find_all(), and select() function and inside those functions, we will put tags that are being used to create a table.

Steps to scrape the table from the web page:-

  1. Import necessary modules (bs4, pandas, requests).
  2. Load an HTML document.
  3. Pass the HTML document into the Beautifulsoup() function.
  4. Get the table's attributes from the web page using the ".select()" method. e.g - soup.select('table#dataTablesFaculties')[0]
  5. After getting the tables data then convert that into the data frame using the pandas library. e.g - pd.read_html(str(table))[0]

 

For more related projects:-

https://www.projectpro.io/projects/data-science-projects/data-science-projects-in-python
https://www.projectpro.io/projects/data-science-projects/machine-learning-projects-in-python

Code:-

import requests
from bs4 import BeautifulSoup as bs

# load the projectpro webpage content
r = requests.get('https://www.projectpro.io/recipes')

# convert to beautiful soup
soup = bs(r.content)

# printing our web page
print(soup.prettify())

Scrapping the table:-


import pandas as pd

table = soup.select('table#dataTablesFaculties')[0]
columns = table.find('thead').find_all('th')
columns

dataframe:-


table_df = pd.read_html(str(table))[0]
table_df

What Users are saying..

profile image

Ray han

Tech Leader | Stanford / Yale University
linkedin profile url

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

End-to-End Snowflake Healthcare Analytics Project on AWS-1
In this Snowflake Healthcare Analytics Project, you will leverage Snowflake on AWS to predict patient length of stay (LOS) in hospitals. The prediction of LOS can help in efficient resource allocation, lower the risk of staff/visitor infections, and improve overall hospital functioning.

Build a Credit Default Risk Prediction Model with LightGBM
In this Machine Learning Project, you will build a classification model for default prediction with LightGBM.

Learn to Build a Siamese Neural Network for Image Similarity
In this Deep Learning Project, you will learn how to build a siamese neural network with Keras and Tensorflow for Image Similarity.

Build a Collaborative Filtering Recommender System in Python
Use the Amazon Reviews/Ratings dataset of 2 Million records to build a recommender system using memory-based collaborative filtering in Python.

Recommender System Machine Learning Project for Beginners-2
Recommender System Machine Learning Project for Beginners Part 2- Learn how to build a recommender system for market basket analysis using association rule mining.

Digit Recognition using CNN for MNIST Dataset in Python
In this deep learning project, you will build a convolutional neural network using MNIST dataset for handwritten digit recognition.

Avocado Machine Learning Project Python for Price Prediction
In this ML Project, you will use the Avocado dataset to build a machine learning model to predict the average price of avocado which is continuous in nature based on region and varieties of avocado.

Build an End-to-End AWS SageMaker Classification Model
MLOps on AWS SageMaker -Learn to Build an End-to-End Classification Model on SageMaker to predict a patient’s cause of death.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

ML Model Deployment on AWS for Customer Churn Prediction
MLOps Project-Deploy Machine Learning Model to Production Python on AWS for Customer Churn Prediction