How to upload files into Google drive using python script?

How to upload files into Google drive using python script?

How to upload files into Google drive using python script?

In this Big Data recipe, we will show how to upload files into Google drive using python script


Recipe Objective

In big data scenarios , extracting data from Google drive into orchestration workflows to initially store the data in data lakes followed by series of operations like data validation, cleaning , transformation , is widely used to gather business insights from the data .In this recipe, we are going to illustrate how we manage files from Gdrive and use them in data flow orchestration processes .


  • Install the pydrive python module as follows : pip install pydrive
  • The below codes can be run in Jupyter notebook , or any python console

Step 1: Import the libraries

from pydrive.auth import GoogleAuth from import GoogleDrive

Step 2: OAuth made easy

Follow the steps to Get Authentication for Google Service API in the below link: Get Authentication for Google Service API

Download client_secrets.json from Google API Console and OAuth2.0 is done in two lines. You can customize the behavior of OAuth2 in one settings file settings.yaml

gauth = GoogleAuth() drive = GoogleDrive(gauth)

Above steps together as follows :

from pydrive.auth import GoogleAuth from import GoogleDrive gauth = GoogleAuth() drive = GoogleDrive(gauth)

Step 3 : Upload files to your Google Drive

upload_file_list = ['1.jpg', '2.jpg'] for upload_file in upload_file_list: gfile = drive.CreateFile({'parents': [{'id': '1pzschX3uMbxU0lB5WZ6IlEEeAUE8MZ-t'}]}) # Read file and set it as the content of this instance. gfile.SetContentFile(upload_file) gfile.Upload() # Upload the file.

Output of the above code:

  • The above code uploads my two local files 1.jpg and 2.jpg to my Google Drive folder test/. To do that, the pydrive library will create two files in Google Drive and then read and upload the two files to the corresponding folder.
  • Note that we need to provide the id of the corresponding Google Drive folder. In this example, the test folder's ID is 1pzschX3uMbxU0lB5WZ6IlEEeAUE8MZ-t. You can get the Google Drive folder ID from the browser.
  • For example: when we open the test folder in my Google Drive, the browser shows the address as Then the corresponding ID for the test folder is the part after the last \ symbol, which is 1cIMiqUDUNldxO6Nl-KVuS9SV-cWi9WLi.

Step 4 : List out files from Google Drive

We can also list all files from the specific folder in the google drive as follows :

file_list = drive.ListFile({'q': "'{}' in parents and trashed=false".format('1cIMiqUDUNldxO6Nl-KVuS9SV-cWi9WLi')}).GetList() for file in file_list: print('title: %s, id: %s' % (file['title'], file['id']))

Output of the above code:

Step 5 : Download the files from Google Drive

We can also download the files from the google Drive as following. Note - after listing the files only we can download the file.

for i, file in enumerate(sorted(file_list, key = lambda x: x['title']), start=1): print('Downloading {} file from GDrive ({}/{})'.format(file['title'], i, len(file_list))) file.GetContentFile(file['title'])

Output of the Above code:

In the above snapshot files are downloaded from the specific folder , Note here files will download where the code will be executed.

Step 6 : Create the Text files in Google Drive

We can also write file directly to Google Drive using the following code:

# Create a GoogleDriveFile instance with title 'test.txt'. file1 = drive.CreateFile({'parents': [{'id': '1cIMiqUDUNldxO6Nl-KVuS9SV-cWi9WLi'}],'title': 'test.txt'}) # Set content of the file from the given string. file1.SetContentString('Hello World!') file1.Upload()

Output of the above code : test.txt file is created in google drive.

Step 7 : Read content of the text file directly from Google Drive

Also, we can read the file directly from Google Drive using the below code :

file2 = drive.CreateFile({'id': file1['id']}) file2.GetContentString('test.txt')

Output of the above code:

In the above the snapshot reading the content of the file as "Hello world"

Relevant Projects

Spark Project-Analysis and Visualization on Yelp Dataset
The goal of this Spark project is to analyze business reviews from Yelp dataset and ingest the final output of data processing in Elastic Search.Also, use the visualisation tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.

Analysing Big Data with Twitter Sentiments using Spark Streaming
In this big data spark project, we will do Twitter sentiment analysis using spark streaming on the incoming streaming data.

Yelp Data Processing using Spark and Hive Part 2
In this spark project, we will continue building the data warehouse from the previous project Yelp Data Processing Using Spark And Hive Part 1 and will do further data processing to develop diverse data products.

Web Server Log Processing using Hadoop
In this hadoop project, you will be using a sample application log file from an application server to a demonstrated scaled-down server log processing pipeline.

Hadoop Project for Beginners-SQL Analytics with Hive
In this hadoop project, learn about the features in Hive that allow us to perform analytical queries over large datasets.

Tough engineering choices with large datasets in Hive Part - 2
This is in continuation of the previous Hive project "Tough engineering choices with large datasets in Hive Part - 1", where we will work on processing big data sets using Hive.

Airline Dataset Analysis using Hadoop, Hive, Pig and Impala
Hadoop Project- Perform basic big data analysis on airline dataset using big data tools -Pig, Hive and Impala.

Spark Project -Real-time data collection and Spark Streaming Aggregation
In this big data project, we will embark on real-time data collection and aggregation from a simulated real-time system using Spark Streaming.

Analyse Yelp Dataset with Spark & Parquet Format on Azure Databricks
In this Databricks Azure project, you will use Spark & Parquet file formats to analyse the Yelp reviews dataset. As part of this you will deploy Azure data factory, data pipelines and visualise the analysis.

Event Data Analysis using AWS ELK Stack
This Elasticsearch example deploys the AWS ELK stack to analyse streaming event data. Tools used include Nifi, PySpark, Elasticsearch, Logstash and Kibana for visualisation.