Introducing GAReport for Pulling Google Analytics API Reports

Since November of 2020, I have been downloading daily Google Analytics reports to analyze. It wasn't bad when there was just 1 or 2, but as the number grew, it started to become onerous. After a long weekend, I was spending 15-20 minutes downloading files. So, over the last month, I've worked on a simple way to call the Google Analytics (v4) API.

The Google Analytics API provides a programmatic way to gather reports. Automating my work is the fastest way to get my time back, so I dug in.

Looking around the web, I quickly found a bunch of code based on Google's initial 2016 sample for v4 of the Google Analytics API. While it all works, pulling multiple reports or changing things on the fly is a pain. I wanted a high-level abstraction to ease the development process.

Borrowing the excellent code base from RitwikGA/GoogleAnalytics-Pandas-Sheet I created a project that wraps it into a class It's now simple to receive the results in a Pandas dataframe. And from Pandas running calculations, creating visualizations or even exporting to a database is a breeze.

Once I had that working, I got annoyed at the API labels. They're just not human readable. In fact, they can diverge considerably from the human label. I eventually found the entire API specification as a JSON file and turned it into a little class that receives the API label and returns the human readable equivalent, or the other way around.

Which means the user can copy the labels from any report on Analytics.Google.com and get expected results.

Having received so much free help, I've made all of this freely available on Github, with detailed instructions on how to get started using the API.

I usually run the code from a Jupyter notebook, and I've also supplied a little test file that calls the 2 classes to pull a report. The meat of the code is pretty straightforward:

# File: GAPullTest.py
from GAReport import GAReport

VIEW_ID = 'PutViewIDHere' 
DIMENSIONS = ["Page", ]
METRICS = ["Pageviews", "Unique Pageviews", "Avg. Time on Page", "Entrances", "Bounce Rate", "% Exit", "Page Value"]
FILTERS= "ga:pagePath=~news"

report = GAReport(startdate="yesterday", 
    enddate="yesterday", 
    viewID=VIEW_ID, 
    dimensions=DIMENSIONS, 
    metrics=METRICS, 
    filters=FILTERS)
print(report.df.head()) # Print first few lines

The output would look something like this…

Page Pageviews Unique Pageviews Avg. Time on Page Entrances Bounce Rate % Exit Page Value
0 /news 25 23 11.05 0 0 20 33.6957
1 /news/2021/new-food-services-partner 15 15 50.3077 1 100 13.3333 0
2 /news/2021/what-to-expect-for-fall-2021 88 67 76.6923 5 100 26.1364 11.5672

Now pulling an analytics report just takes a few seconds. This approach has freed me to focus on the analysis instead of the administration.