In big data scenarios , extracting data from Google drive into orchestration workflows to initially store the data in data lakes followed by series of operations like data validation, cleaning , transformation , is widely used to gather business insights from the data .In this recipe, we are going to illustrate how we manage files from Gdrive and use them in data flow orchestration processes .
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
Follow the steps to Get Authentication for Google Service API in the below link: Get Authentication for Google Service API
Download client_secrets.json from Google API Console and OAuth2.0 is done in two lines. You can customize the behavior of OAuth2 in one settings file settings.yaml
gauth = GoogleAuth()
drive = GoogleDrive(gauth)
Above steps together as follows :
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
gauth = GoogleAuth()
drive = GoogleDrive(gauth)
upload_file_list = ['1.jpg', '2.jpg']
for upload_file in upload_file_list:
gfile = drive.CreateFile({'parents': [{'id': '1pzschX3uMbxU0lB5WZ6IlEEeAUE8MZ-t'}]})
# Read file and set it as the content of this instance.
gfile.SetContentFile(upload_file)
gfile.Upload() # Upload the file.
Output of the above code:
We can also list all files from the specific folder in the google drive as follows :
file_list = drive.ListFile({'q': "'{}' in parents and trashed=false".format('1cIMiqUDUNldxO6Nl-KVuS9SV-cWi9WLi')}).GetList()
for file in file_list:
print('title: %s, id: %s' % (file['title'], file['id']))
Output of the above code:
We can also download the files from the google Drive as following. Note - after listing the files only we can download the file.
for i, file in enumerate(sorted(file_list, key = lambda x: x['title']), start=1):
print('Downloading {} file from GDrive ({}/{})'.format(file['title'], i, len(file_list)))
file.GetContentFile(file['title'])
Output of the Above code:
In the above snapshot files are downloaded from the specific folder , Note here files will download where the code will be executed.
We can also write file directly to Google Drive using the following code:
# Create a GoogleDriveFile instance with title 'test.txt'.
file1 = drive.CreateFile({'parents': [{'id': '1cIMiqUDUNldxO6Nl-KVuS9SV-cWi9WLi'}],'title': 'test.txt'})
# Set content of the file from the given string.
file1.SetContentString('Hello World!')
file1.Upload()
Output of the above code : test.txt file is created in google drive.
Also, we can read the file directly from Google Drive using the below code :
file2 = drive.CreateFile({'id': file1['id']})
file2.GetContentString('test.txt')
Output of the above code:
In the above the snapshot reading the content of the file as "Hello world"