How to connect to mysql using python and import the csv file into mysql and create a table?
BIG DATA RECIPES DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to connect to mysql using python and import the csv file into mysql and create a table?

How to connect to mysql using python and import the csv file into mysql and create a table?

This recipe helps you connect to mysql using python and import the csv file into mysql and create a table

0

Recipe Objective

In the big data scenarios , we use MySQL as a relational database at times to store transactional data and Sqoop the data into HDFS based environments for further analysis to gain business insights out of the data .We interact with mysql database to store and retrieve data, sometimes we create tables through the csv files

System requirements :

  • Install the pydrive python module as follows :
  • pip install mysql-connector-python
  • pip install pandas
  • MySQL Workbench with mysql
  • Link to install mysql workbench database : click here
  • The below codes can be run in Jupyter notebook , or any python console

Step 1: Prepare the CSV File

To begin, prepare the CSV file that you'd like to import to MySQL. For example, I prepared a simple CSV file with the following data:

Note: the above employee csv data is taken from the below link employee_data

Step 2: Import the CSV File into the DataFrame.

Next, import the CSV file into Python using the pandas library. Here is the code that I used to import the CSV file, and then create the DataFrame. You'll need to change the path name to reflect the location where the CSV file is stored on your computer

import pandas as pd empdata = pd.read_csv('C:\\Users\\XXXXX\\empdata.csv', index_col=False, delimiter = ',') empdata.head()

Output of the above code:

Step 3 : Connect to the MySQL using Python and create a Database

Create a connection object to connect to MySQL, The connect() constructor creates a connection to the MySQL and returns a MySQLConnection object.

import mysql.connector as msql from mysql.connector import Error try: conn = msql.connect(host='localhost', user='root', password='root@123')#give ur username, password if conn.is_connected(): cursor = conn.cursor() cursor.execute("CREATE DATABASE employee") print("Database is created") except Error as e: print("Error while connecting to MySQL", e)

Note :if you don't connect then, please install the mysql-connector-python package, type the following command: pip install mysql-connector-python

Output of the above code: After running the above the code will create an employee database in mysql as shown in below.

Step 3 : Create a table and Import the CSV data into the MySQL table

We will create an employee_data table under the employee database and insert the records in MySQL with below python code.

import mysql.connector as msql from mysql.connector import Error try: conn = mysql.connect(host='localhost', database='employee', user='root', password='root@123') if conn.is_connected(): cursor = conn.cursor() cursor.execute("select database();") record = cursor.fetchone() print("You're connected to database: ", record) cursor.execute('DROP TABLE IF EXISTS employee_data;') print('Creating table....') # in the below line please pass the create table statement which you want #to create cursor.execute("CREATE TABLE employee_data(first_name varchar(255),last_name varchar(255),company_name varchar(255),address varchar(255),city varchar(255),county varchar(255),state varchar(255),zip int,phone1 varchar(255),phone2 varchar(255),email varchar(255),web varchar(255))") print("Table is created....") #loop through the data frame for i,row in empdata.iterrows(): #here %S means string values sql = "INSERT INTO employee.employee_data VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)" cursor.execute(sql, tuple(row)) print("Record inserted") # the connection is not auto committed by default, so we must commit to save our changes conn.commit() except Error as e: print("Error while connecting to MySQL", e)

Output of the above code:

Output from workbench:

Step 4 : Query the Table

Query the table to make sure that our inserted data has been saved correctly.

# Execute query sql = "SELECT * FROM employee.employee_data" cursor.execute(sql) # Fetch all the records result = cursor.fetchall() for i in result: print(i)

Output of the Above code:

Output from workbench:

Relevant Projects

Implementing Slow Changing Dimensions in a Data Warehouse using Hive and Spark
Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.

Tough engineering choices with large datasets in Hive Part - 1
Explore hive usage efficiently in this hadoop hive project using various file formats such as JSON, CSV, ORC, AVRO and compare their relative performances

Spark Project -Real-time data collection and Spark Streaming Aggregation
In this big data project, we will embark on real-time data collection and aggregation from a simulated real-time system using Spark Streaming.

Data Warehouse Design for E-commerce Environments
In this hive project, you will design a data warehouse for e-commerce environments.

Web Server Log Processing using Hadoop
In this hadoop project, you will be using a sample application log file from an application server to a demonstrated scaled-down server log processing pipeline.

Hadoop Project for Beginners-SQL Analytics with Hive
In this hadoop project, learn about the features in Hive that allow us to perform analytical queries over large datasets.

Build a big data pipeline with AWS Quicksight, Druid, and Hive
Use the dataset on aviation for analytics to simulate a complex real-world big data pipeline based on messaging with AWS Quicksight, Druid, NiFi, Kafka, and Hive.

Real-time Auto Tracking with Spark-Redis
Spark Project - Discuss real-time monitoring of taxis in a city. The real-time data streaming will be simulated using Flume. The ingestion will be done using Spark Streaming.

Yelp Data Processing using Spark and Hive Part 2
In this spark project, we will continue building the data warehouse from the previous project Yelp Data Processing Using Spark And Hive Part 1 and will do further data processing to develop diverse data products.

Analyse Yelp Dataset with Spark & Parquet Format on Azure Databricks
In this Databricks Azure project, you will use Spark & Parquet file formats to analyse the Yelp reviews dataset. As part of this you will deploy Azure data factory, data pipelines and visualise the analysis.