How to connect to mysql using python and import the csv file into mysql and create a table?
BIG DATA RECIPES DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to connect to mysql using python and import the csv file into mysql and create a table?

How to connect to mysql using python and import the csv file into mysql and create a table?

This recipe helps you connect to mysql using python and import the csv file into mysql and create a table

0

Recipe Objective

In the big data scenarios , we use MySQL as a relational database at times to store transactional data and Sqoop the data into HDFS based environments for further analysis to gain business insights out of the data .We interact with mysql database to store and retrieve data, sometimes we create tables through the csv files

System requirements :

  • Install the pydrive python module as follows :
  • pip install mysql-connector-python
  • pip install pandas
  • MySQL Workbench with mysql
  • Link to install mysql workbench database : click here
  • The below codes can be run in Jupyter notebook , or any python console

Step 1: Prepare the CSV File

To begin, prepare the CSV file that you'd like to import to MySQL. For example, I prepared a simple CSV file with the following data:

Note: the above employee csv data is taken from the below link employee_data

Step 2: Import the CSV File into the DataFrame.

Next, import the CSV file into Python using the pandas library. Here is the code that I used to import the CSV file, and then create the DataFrame. You'll need to change the path name to reflect the location where the CSV file is stored on your computer

import pandas as pd empdata = pd.read_csv('C:\\Users\\XXXXX\\empdata.csv', index_col=False, delimiter = ',') empdata.head()

Output of the above code:

Step 3 : Connect to the MySQL using Python and create a Database

Create a connection object to connect to MySQL, The connect() constructor creates a connection to the MySQL and returns a MySQLConnection object.

import mysql.connector as msql from mysql.connector import Error try: conn = msql.connect(host='localhost', user='root', password='root@123')#give ur username, password if conn.is_connected(): cursor = conn.cursor() cursor.execute("CREATE DATABASE employee") print("Database is created") except Error as e: print("Error while connecting to MySQL", e)

Note :if you don't connect then, please install the mysql-connector-python package, type the following command: pip install mysql-connector-python

Output of the above code: After running the above the code will create an employee database in mysql as shown in below.

Step 3 : Create a table and Import the CSV data into the MySQL table

We will create an employee_data table under the employee database and insert the records in MySQL with below python code.

import mysql.connector as msql from mysql.connector import Error try: conn = mysql.connect(host='localhost', database='employee', user='root', password='root@123') if conn.is_connected(): cursor = conn.cursor() cursor.execute("select database();") record = cursor.fetchone() print("You're connected to database: ", record) cursor.execute('DROP TABLE IF EXISTS employee_data;') print('Creating table....') # in the below line please pass the create table statement which you want #to create cursor.execute("CREATE TABLE employee_data(first_name varchar(255),last_name varchar(255),company_name varchar(255),address varchar(255),city varchar(255),county varchar(255),state varchar(255),zip int,phone1 varchar(255),phone2 varchar(255),email varchar(255),web varchar(255))") print("Table is created....") #loop through the data frame for i,row in empdata.iterrows(): #here %S means string values sql = "INSERT INTO employee.employee_data VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)" cursor.execute(sql, tuple(row)) print("Record inserted") # the connection is not auto committed by default, so we must commit to save our changes conn.commit() except Error as e: print("Error while connecting to MySQL", e)

Output of the above code:

Output from workbench:

Step 4 : Query the Table

Query the table to make sure that our inserted data has been saved correctly.

# Execute query sql = "SELECT * FROM employee.employee_data" cursor.execute(sql) # Fetch all the records result = cursor.fetchall() for i in result: print(i)

Output of the Above code:

Output from workbench:

Relevant Projects

Create A Data Pipeline Based On Messaging Using PySpark And Hive - Covid-19 Analysis
In this PySpark project, you will simulate a complex real-world data pipeline based on messaging. This project is deployed using the following tech stack - NiFi, PySpark, Hive, HDFS, Kafka, Airflow, Tableau and AWS QuickSight.

Yelp Data Processing Using Spark And Hive Part 1
In this big data project, we will continue from a previous hive project "Data engineering on Yelp Datasets using Hadoop tools" and do the entire data processing using spark.

Hadoop Project for Beginners-SQL Analytics with Hive
In this hadoop project, learn about the features in Hive that allow us to perform analytical queries over large datasets.

Spark Project-Analysis and Visualization on Yelp Dataset
The goal of this Spark project is to analyze business reviews from Yelp dataset and ingest the final output of data processing in Elastic Search.Also, use the visualisation tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.

Tough engineering choices with large datasets in Hive Part - 2
This is in continuation of the previous Hive project "Tough engineering choices with large datasets in Hive Part - 1", where we will work on processing big data sets using Hive.

Airline Dataset Analysis using Hadoop, Hive, Pig and Impala
Hadoop Project- Perform basic big data analysis on airline dataset using big data tools -Pig, Hive and Impala.

Real-Time Log Processing using Spark Streaming Architecture
In this Spark project, we are going to bring processing to the speed layer of the lambda architecture which opens up capabilities to monitor application real time performance, measure real time comfort with applications and real time alert in case of security

Real-Time Log Processing in Kafka for Streaming Architecture
The goal of this apache kafka project is to process log entries from applications in real-time using Kafka for the streaming architecture in a microservice sense.

Real-time Auto Tracking with Spark-Redis
Spark Project - Discuss real-time monitoring of taxis in a city. The real-time data streaming will be simulated using Flume. The ingestion will be done using Spark Streaming.

Movielens dataset analysis for movie recommendations using Spark in Azure
In this Databricks Azure tutorial project, you will use Spark Sql to analyse the movielens dataset to provide movie recommendations. As part of this you will deploy Azure data factory, data pipelines and visualise the analysis.