Export data from Google Analytics to Google Bigquery

Google+Tweet about this on TwitterShare on FacebookShare on LinkedIn

Alright. Now, when the web is nearly overburdened with the abc solutions simply explaining how to use built-in analytics tools brought almost to perfection by Google engineers, I guess it’s time for some really advanced analytics stuff. So, in this post I will guide you step-by-step how to export data from Google Analytics to Google Bigquery.

Hopefully, this is what you get if you implement the solution correctly. This is a table containing pure nonsampled Google Analytics data!

export data from google analytics to google bigquery - result

Post content

Before we begin

I’d like to express my profound gratitude to Alexander Eroshkin, a project manager at Yandex and the author of the solution described in this post, for his comprehensive aid while implementing and testing this solution!

I presume there’s no need in describing the advantages you gain when you use Google Bigquery for big data analytics. However, there are certain use-cases you might find valuable for your analytics tasks/goals. You can find them here.

Required tools

As a matter of fact, there are not many tools required to export data from Google Analytics to Google Bigquery. All you need is the following:

  1. Google Analytics account
  2. Google Cloud platform account
  3. and of course Google Tag Manager account
The solution described in this post involves the use of GTM, so if you have GA snippet hard-coded, leave a request message in the comment section, and I will send you a detailed implementation instruction.

A step-by-step guide for exporting raw data from Google Analytics to Google Bigquery

Now, when we’re done the preliminary part, let us linger no more and delve right into it! Please remember that the solution is not the easiest web-analytics tool in the world to implement/use. So, should any issues/questions arise, please let me know in the comment section.

Step 1: create GCP account

If you don’t have an account at Google Cloud Platform (GCP), go and get one right here.

export data from google analytics to google bigquery - gcp account

Please note that you should use real bank card details. Don’t worry, you’re NOT going to be charged without your permission. Check GCP pricing for more details.

Step 2: download and install Python

If you don’t already have it, download and install Python 2.7

raw-data-export-google-analytics-google-bigquery-python-min

Step 3: download project’s zip

Go to Alexander’s github and download the zip file of the project. With the help of this app we’re going to export data from Google Analytics to Google Bigquery.

raw-data-export-google-analytics-google-bigquery-downloadzip-min

Step 4: download Google App Engine SDK for Python

One last thing to download is the GAE SDK for Python. You can find it by clicking on the “Or, you can download the original App Engine SKD…” link.

raw-data-export-google-analytics-google-bigquery-python-sdk-min

Step 5: create a new project in GCP console

Try to give your project a descriptive yet short and readable name. Perhaps,  something like mysite-ga-datastreaming. We’re going to use that name for our app settings.

raw-data-export-google-analytics-google-bigquery-create-project-min

Step 6: unzip the project you’ve downloaded on step 3

Unzip the content of the project and then paste all the content to a folder named after you project’s id in GCP. So, according to the recommendation given in step 5, the folder should be named like “mysite-ga-datastreaming”.

Step 7: configure the app.yaml file

Open the app.yaml file (you can find in the folder with the app content) and type in your project’s id just like it is shown below.

raw-data-export-google-analytics-google-bigquery-app-min

Step 8: create a custom js variable in order to export data from Google Analytics to Google Bigquery

In my previous post I shared a custom js variable that allows you to export data from Google Analytics to Google Bigquery (or anywhere you like). We’re going to need it right now!

So, copy carefully the JS content below, then create a custom js variable and paste the code right into it.

IMPORTANT! Paste your project id instead of “your-project-id-here” fragment. Also, if you want to have GA clientId to be sent with each hit, create a custom dimension and change the cd’s index right in this place '&cd1=' + model.get("clientId"); to whatever index you’ve received after creating the dimension.

Here’s the code

Next, add a sendHitTask field to each of your UA tags and use the variable we have just created as a value for it. Thus you will have all your Google Analytics raw data sent to Google Bigquery.

raw-data-export-google-analytics-google-bigquery-sendhittask-min

Step 8.1 (optional): modify your GA code snippet

Here’s what you need to do if you still have GA hard-coded on your site.

First off, change the gabq.js file so it contains your project id domain

Next, modify GA code snippet so it has a link to the plugin AND and the plugin itself. This is how it should look like

Please mind that this step is required only in case you have GA hard-coded.

Step 9: GCP auth set-up

Now go hit this page – https://console.cloud.google.com/apis/credentials – and set the auth configuration as shown on screenshots below.

raw-data-export-google-analytics-google-bigquery-cpd-auth-min

raw-data-export-google-analytics-google-bigquery-cpd-auth2-min

raw-data-export-google-analytics-google-bigquery-cpd-auth3-min

Step 10: create a new dataset in the Bigquery console

Create a new dataset in your project. See the screenshot below.

raw-data-export-google-analytics-google-bigquery-cpd-dataset

Step 11: set dataset and project id names

Open the bqloader.py file and paste the names of your project id and dataset just like it shown below. In addition, you can aslo rename the table or leave it as it is in the file.

raw-data-export-google-analytics-google-bigquery-cpd-bqloader-min

Finally, we’re done with the setup. Now it’s time for the action!

Step 12: exporting your app

Open the Google App Engine Launcher and then click File -> Add Existing Application. Set all the required fields like it is shown below.

raw-data-export-google-analytics-google-bigquery-cpd-loader-min

Next, recheck if everything is set correctly. If it is, click deploy. In case you did everything right, this is what you should see in the console.

raw-data-export-google-analytics-google-bigquery-cpd-result-min

Step 13: creating a data table for your Google Analytics data

Now, here’s a final step. Open your browser and go to the following page:

[Project ID].appspot.com/tasks/create_bq_table

By entering this page you trigger the creation of the table for your GA data. It should say “ok”.

raw-data-export-google-analytics-google-bigquery-table-created

Step 14: be patient

You won’t see data immediately appearing in the table. Since it usually takes from a couple of minutes to an hour for data to appear, please be patient and don’t beat false alarm!

Alright. Can you see it now? Yes? YES?! YEAH!

that's a success!

Reasons to export data from Google Analytics to Google Bigquery

As a result, you get a table containing all the raw Google Analytics data. Not only this allows you to solve a couple of common GA issues, but also gain a really helpful insight on what is really going on there on the site. Here’s a short list of advantages you receive.

  1. No more data sampling! This is a huge step-up for those sites having really heavy traffic on a daily basis.
  2. An opportunity to build ANY reports with ANY segments using ANY configuration you like
  3. Implement big data analytics tools for your site or mobile app
  4. Visualize data ANYWHERE you like (not only in Google Data Studio)
  5. Should any issues occur with you GA property, you will always have a back-up data source

Certainly, there are some challenges you should be ready for. Say, composing advanced SQL queries. But I’m sure if you got this far, that won’t be a problem for you!

Working with the table

In conclusion, I’d like to show you a couple of simple queries you can use for mining precious insights.

So, here’s a query showing a number of page views your site received for certain dates.

And here’s another query, more complexed, showing numbers of pageviews, users and sessions in a single table.

These queries are just a very simple example of how you can use you SQL knowledge in order to gain data you need. If you want more complexed (interesting!) examples, let me know in the comment section below.

Summary

WOW! Now this is what I call a detailed step-by-step guide. Seems like this post turned out to be the longest one for the whole blog.

As always, if you’re having any troubles while implementing this solution, let me know in the comment section! This is not an easy-to-do one, but the result you get in the end does worth it!

Google+Tweet about this on TwitterShare on FacebookShare on LinkedIn
  • Rama Pollini

    Hi Dimitri, thanks for this Article!
    I will try to execute your tutorial but I need information about the implementation in the case of GA snippet hard-coded. I use Magento and I inject the GA code using the admin panel (not directly in the page).
    Rama

    • Dmitri Ilin

      Hi Rama,

      thanks for comment. I will add the information you need to this guide during the next week.

    • Dmitri Ilin

      Hey Rama

      I’ve added additional step 8.1 for those having GA hard-coded

  • Marco Porracin Dieguez

    Now the AWS version …

    • Dmitri Ilin

      AWS version?

      • Marco Porracin Dieguez

        Sry i should explained a little more.
        I would like to see a similar implementation but instead of using Google Cloud using Amazon Web Services.

  • Marco Porracin Dieguez

    Thanks a lot for this! I just starting testing it with the smallest of our sites (13K sessions/day). I ll let u know anything i learn working with this.

  • Ivan

    Дмитрий, спрошу ещё раз 🙂

    Получаю следущие ошибки:
    ***.appspot.com/collect?undefined&cd1=664932034.1495129151

    Местоположение ошибки
    _internal_error (/base/data/home/runtimes/python27_experiment/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py:1552)

    KeyError: ‘dl’

    Подскажите пожалуйста, вчём ошибка и как её исправить?! Данные в BigQuery не передаются из за большого количества ошибки 500

    С уважением
    Иван

    • Ivan

      Всё, теперь пошли данные! ) благодарю за статью, проблема была в названии переменной GTM, нужно было использовать именно название указанное в статье!

      Буду ждать новых статей по стримингу и BigQuery

      С уважением

      • Dmitri Ilin

        Hi Ivan!

        Glad the post was of help. Keep GBQ-ing!