Since November of 2020, I have been downloading daily Google Analytics reports to analyze. It wasn't bad when there was just 1 or 2, but as the number grew, it started to become onerous. After a long weekend, I was spending 15-20 minutes downloading files. So, over the last month, I've worked on a simple way to call the Google Analytics (v4) API.
The Google Analytics API provides a programmatic way to gather reports. Automating my work is the fastest way to get my time back, so I dug in.
Looking around the web, I quickly found a bunch of code based on Google's initial 2016 sample for v4 of the Google Analytics API. While it all works, pulling multiple reports or changing things on the fly is a pain. I wanted a high-level abstraction to ease the development process.
Borrowing the excellent code base from RitwikGA/GoogleAnalytics-Pandas-Sheet I created a project that wraps it into a class It's now simple to receive the results in a Pandas dataframe. And from Pandas running calculations, creating visualizations or even exporting to a database is a breeze.
Once I had that working, I got annoyed at the API labels. They're just not human readable. In fact, they can diverge considerably from the human label. I eventually found the entire API specification as a JSON file and turned it into a little class that receives the API label and returns the human readable equivalent, or the other way around.
Which means the user can copy the labels from any report on Analytics.Google.com and get expected results.
Having received so much free help, I've made all of this freely available on Github, with detailed instructions on how to get started using the API.
I usually run the code from a Jupyter notebook, and I've also supplied a little test file that calls the 2 classes to pull a report. The meat of the code is pretty straightforward:
# File: GAPullTest.py
from GAReport import GAReport
VIEW_ID = 'PutViewIDHere'
DIMENSIONS = ["Page", ]
METRICS = ["Pageviews", "Unique Pageviews", "Avg. Time on Page", "Entrances", "Bounce Rate", "% Exit", "Page Value"]
FILTERS= "ga:pagePath=~news"
report = GAReport(startdate="yesterday",
enddate="yesterday",
viewID=VIEW_ID,
dimensions=DIMENSIONS,
metrics=METRICS,
filters=FILTERS)
print(report.df.head()) # Print first few lines
The output would look something like this…
Page | Pageviews | Unique Pageviews | Avg. Time on Page | Entrances | Bounce Rate | % Exit | Page Value | |
---|---|---|---|---|---|---|---|---|
0 | /news | 25 | 23 | 11.05 | 0 | 0 | 20 | 33.6957 |
1 | /news/2021/new-food-services-partner | 15 | 15 | 50.3077 | 1 | 100 | 13.3333 | 0 |
2 | /news/2021/what-to-expect-for-fall-2021 | 88 | 67 | 76.6923 | 5 | 100 | 26.1364 | 11.5672 |
Now pulling an analytics report just takes a few seconds. This approach has freed me to focus on the analysis instead of the administration.