How to upload files into Google drive using python script?
BIG DATA RECIPES DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to upload files into Google drive using python script?

How to upload files into Google drive using python script?

In this Big Data recipe, we will show how to upload files into Google drive using python script

1

Recipe Objective

In big data scenarios , extracting data from Google drive into orchestration workflows to initially store the data in data lakes followed by series of operations like data validation, cleaning , transformation , is widely used to gather business insights from the data .In this recipe, we are going to illustrate how we manage files from Gdrive and use them in data flow orchestration processes .

Prerequisite

  • Install the pydrive python module as follows : pip install pydrive
  • The below codes can be run in Jupyter notebook , or any python console

Step 1: Import the libraries

from pydrive.auth import GoogleAuth from pydrive.drive import GoogleDrive

Step 2: OAuth made easy

Follow the steps to Get Authentication for Google Service API in the below link: Get Authentication for Google Service API

Download client_secrets.json from Google API Console and OAuth2.0 is done in two lines. You can customize the behavior of OAuth2 in one settings file settings.yaml

gauth = GoogleAuth() drive = GoogleDrive(gauth)

Above steps together as follows :

from pydrive.auth import GoogleAuth from pydrive.drive import GoogleDrive gauth = GoogleAuth() drive = GoogleDrive(gauth)

Step 3 : Upload files to your Google Drive

upload_file_list = ['1.jpg', '2.jpg'] for upload_file in upload_file_list: gfile = drive.CreateFile({'parents': [{'id': '1pzschX3uMbxU0lB5WZ6IlEEeAUE8MZ-t'}]}) # Read file and set it as the content of this instance. gfile.SetContentFile(upload_file) gfile.Upload() # Upload the file.

Output of the above code:

  • The above code uploads my two local files 1.jpg and 2.jpg to my Google Drive folder test/. To do that, the pydrive library will create two files in Google Drive and then read and upload the two files to the corresponding folder.
  • Note that we need to provide the id of the corresponding Google Drive folder. In this example, the test folder's ID is 1pzschX3uMbxU0lB5WZ6IlEEeAUE8MZ-t. You can get the Google Drive folder ID from the browser.
  • For example: when we open the test folder in my Google Drive, the browser shows the address as https://drive.google.com/drive/folders/1cIMiqUDUNldxO6Nl-KVuS9SV-cWi9WLi. Then the corresponding ID for the test folder is the part after the last \ symbol, which is 1cIMiqUDUNldxO6Nl-KVuS9SV-cWi9WLi.

Step 4 : List out files from Google Drive

We can also list all files from the specific folder in the google drive as follows :

file_list = drive.ListFile({'q': "'{}' in parents and trashed=false".format('1cIMiqUDUNldxO6Nl-KVuS9SV-cWi9WLi')}).GetList() for file in file_list: print('title: %s, id: %s' % (file['title'], file['id']))

Output of the above code:

Step 5 : Download the files from Google Drive

We can also download the files from the google Drive as following. Note - after listing the files only we can download the file.

for i, file in enumerate(sorted(file_list, key = lambda x: x['title']), start=1): print('Downloading {} file from GDrive ({}/{})'.format(file['title'], i, len(file_list))) file.GetContentFile(file['title'])

Output of the Above code:

In the above snapshot files are downloaded from the specific folder , Note here files will download where the code will be executed.

Step 6 : Create the Text files in Google Drive

We can also write file directly to Google Drive using the following code:

# Create a GoogleDriveFile instance with title 'test.txt'. file1 = drive.CreateFile({'parents': [{'id': '1cIMiqUDUNldxO6Nl-KVuS9SV-cWi9WLi'}],'title': 'test.txt'}) # Set content of the file from the given string. file1.SetContentString('Hello World!') file1.Upload()

Output of the above code : test.txt file is created in google drive.

Step 7 : Read content of the text file directly from Google Drive

Also, we can read the file directly from Google Drive using the below code :

file2 = drive.CreateFile({'id': file1['id']}) file2.GetContentString('test.txt')

Output of the above code:

In the above the snapshot reading the content of the file as "Hello world"

Relevant Projects

Real-Time Log Processing in Kafka for Streaming Architecture
The goal of this apache kafka project is to process log entries from applications in real-time using Kafka for the streaming architecture in a microservice sense.

Ola Bike Rides Request Demand Forecast
Given big data at taxi service (ride-hailing) i.e. OLA, you will learn multi-step time series forecasting and clustering with Mini-Batch K-means Algorithm on geospatial data to predict future ride requests for a particular region at a given time.

Analyse Yelp Dataset with Spark & Parquet Format on Azure Databricks
In this Databricks Azure project, you will use Spark & Parquet file formats to analyse the Yelp reviews dataset. As part of this you will deploy Azure data factory, data pipelines and visualise the analysis.

Real-Time Log Processing using Spark Streaming Architecture
In this Spark project, we are going to bring processing to the speed layer of the lambda architecture which opens up capabilities to monitor application real time performance, measure real time comfort with applications and real time alert in case of security

Spark Project -Real-time data collection and Spark Streaming Aggregation
In this big data project, we will embark on real-time data collection and aggregation from a simulated real-time system using Spark Streaming.

Hadoop Project-Analysis of Yelp Dataset using Hadoop Hive
The goal of this hadoop project is to apply some data engineering principles to Yelp Dataset in the areas of processing, storage, and retrieval.

Online Hadoop Projects -Solving small file problem in Hadoop
In this hadoop project, we are going to be continuing the series on data engineering by discussing and implementing various ways to solve the hadoop small file problem.

Implementing Slow Changing Dimensions in a Data Warehouse using Hive and Spark
Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.

Finding Unique URL's using Hadoop Hive
Hive Project -Learn to write a Hive program to find the first unique URL, given 'n' number of URL's.

GCP Data Ingestion with SQL using Google Cloud Dataflow
In this GCP Project, you will learn to build a data processing pipeline With Apache Beam, Dataflow & BigQuery on GCP using Yelp Dataset.