How to Import Google Sheet Data Into Google Colaboratory
Motivation
In a part of a technical assessment I had to complete, I was asked to import data using Python; this sounds easy enough. But what about for a Google Sheet that has sheets within it? I decided to write a tutorial how to import the data from the Google Sheet into Google Colaboratory for data preparation and analysis.
Prerequisites
Familiarity with the following:
Step-by-Step Tutorial
NOTE: I will use “>>” as the output if there is one.
Step 1
Give Access to all files in your Google Drive. You will need to click on the provided link after running the code and provide a password from the link into the box in Google Colaboratory. If done correctly, you should get the output shown below as indicated by “>>”
#import necessary library to mount personal Google Drive
from google.colab import drive
drive.mount('/content/drive')>> Mounted at /content/drive
Step 2
Extract data from the Google Sheet. Similar to above, you will need to click on a provided link and provide a password. If you don’t see the Google Sheet in your drive, you may need to make a copy of it into your personal Google Drive for the next step.
#import necessary library to authorize access to Google Sheet
from google.colab import auth
auth.authenticate_user()import gspread
from oauth2client.client import GoogleCredentials# Initiate Google Colab object
gc = gspread.authorize(GoogleCredentials.get_application_default())
Step 3
The below code will open the Google Sheet at the file location, “path-to-file”, and extract the sheet names within the Google Sheet.
# Open up workbook
workbook = gc.open('[path-to-file]')# obtain worksheet titles from workbook
sheet_titles = []
for sheet in workbook.worksheets():
sheet_titles.append(sheet.title)
sheet_titlesExample output
>> ['sheet1', 'sheet2']
Step 4
With the sheet names from above, we extract the values from each specific sheet and store them into a dictionary.
# Get_all_values from sheets and store in dictionary
dict_of_sheets = {}
for sheet_title in sheet_titles:
sheet = workbook.worksheet(sheet_title)
values = sheet.get_all_values()
dict_of_sheets[sheet_title] = values# To confirm all sheets made it into the dictionary
dict_of_sheets.keys()>> dict_keys(['sheet1', 'sheet2'])
Final Step
From the dictionary we can format each sheet into a Pandas dataframe. Check that the dataframe has the correct structure as well.
import pandas as pddf_sheet1 = pd.DataFrame.from_records(dict_of_sheets.get('sheet1'))
df_sheet1.info()>>
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37375 entries, 0 to 37374
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 37375 non-null object
1 col2 37375 non-null object
2 col3 37375 non-null object
dtypes: object(3)
memory usage: 876.1+ KB
Conclusion
I hope this article helps you quickly get your Google Sheet data into Google Colaboratory, so you can get to analysing! If this article helped you, please leave a clap or let me know if you have any issues. Check out my other articles too, and let us grow together!