Photo by Mitchell Luo on Unsplash

How to Import Google Sheet Data Into Google Colaboratory

John Paul Hernandez Alcala
3 min readJun 7, 2021

--

Motivation

In a part of a technical assessment I had to complete, I was asked to import data using Python; this sounds easy enough. But what about for a Google Sheet that has sheets within it? I decided to write a tutorial how to import the data from the Google Sheet into Google Colaboratory for data preparation and analysis.

Prerequisites

Familiarity with the following:

Step-by-Step Tutorial

NOTE: I will use “>>” as the output if there is one.

Step 1

Give Access to all files in your Google Drive. You will need to click on the provided link after running the code and provide a password from the link into the box in Google Colaboratory. If done correctly, you should get the output shown below as indicated by “>>”

#import necessary library to mount personal Google Drive
from google.colab import drive
drive.mount('/content/drive')
>> Mounted at /content/drive

Step 2

Extract data from the Google Sheet. Similar to above, you will need to click on a provided link and provide a password. If you don’t see the Google Sheet in your drive, you may need to make a copy of it into your personal Google Drive for the next step.

#import necessary library to authorize access to Google Sheet
from google.colab import auth
auth.authenticate_user()
import gspread
from oauth2client.client import GoogleCredentials
# Initiate Google Colab object
gc = gspread.authorize(GoogleCredentials.get_application_default())

Step 3

The below code will open the Google Sheet at the file location, “path-to-file”, and extract the sheet names within the Google Sheet.

# Open up workbook
workbook = gc.open('[path-to-file]')
# obtain worksheet titles from workbook
sheet_titles = []
for sheet in workbook.worksheets():
sheet_titles.append(sheet.title)
sheet_titles
Example output
>> ['sheet1', 'sheet2']

Step 4

With the sheet names from above, we extract the values from each specific sheet and store them into a dictionary.

# Get_all_values from sheets and store in dictionary
dict_of_sheets = {}
for sheet_title in sheet_titles:
sheet = workbook.worksheet(sheet_title)
values = sheet.get_all_values()
dict_of_sheets[sheet_title] = values
# To confirm all sheets made it into the dictionary
dict_of_sheets.keys()
>> dict_keys(['sheet1', 'sheet2'])

Final Step

From the dictionary we can format each sheet into a Pandas dataframe. Check that the dataframe has the correct structure as well.

import pandas as pddf_sheet1 = pd.DataFrame.from_records(dict_of_sheets.get('sheet1'))
df_sheet1.info()
>>
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37375 entries, 0 to 37374
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 37375 non-null object
1 col2 37375 non-null object
2 col3 37375 non-null object
dtypes: object(3)
memory usage: 876.1+ KB

Conclusion

I hope this article helps you quickly get your Google Sheet data into Google Colaboratory, so you can get to analysing! If this article helped you, please leave a clap or let me know if you have any issues. Check out my other articles too, and let us grow together!

Photo by Leon on Unsplash

--

--

John Paul Hernandez Alcala

An intraoperative neuromonitor who tinkers with data to see what interesting nuggets he can find.