Blog

Google BigQuery Data Modeller

Wondering where to start when modeling your data in Google BigQuery? This post has you covered with a simple repo and example.

1 tzqg2nhqfudmca2bc6vilw
Photo by Steady Hand Co. on Unsplash. Make your way through the rough data on a clear path.

Today, a lot of ugly data glides across the ether. Usually, some very frustrated people wonder where to start making sense of this and, more importantly, how to structure it to be useful for their company.

Here I propose the usage of a simple script and repo that allows you to push your cleaned data as Views to Google BigQuery. In addition, you can use Jinja templating to make your life easier and even pre-define some variables if you like.

How does it work?

In a file called select_101.sql.j2 you can have this query:

SELECT {{aNumber}} as num

The template reference aNumber is taken from a pre-defined set of variables in a configuration file configuration.json :

{
  "aNumber": 101
}

You also have a schema file called select_101.json :

{
  "metric": "First example",
  "metric_description": "Select only a constant number as output",
  "purpose" : "",
  "usage_example" : "",
  "usage_description" : "",
  "fields": [
    {
      "field": "num",
      "description": "A small number",
      "type": "INTEGER",
      "typical_values": [ 4 ]
    }
  ]
}

Now you are ready to deploy this to Google BigQuery:

-m allows you to provide a metric to deploy

-v indicates you want to deploy a view

-d takes your dataset as input

-p takes your projectId as input

./compile.py -m select_101.sql.j2 -v -d <YOURDATASET> -p <YOURPROJECT>

And you get this output:

1 xjb6e ziayoryplnz53ytq

Including templates into templates…

You can also include templates in another.

In a file called select_101_include.sql.j2 you can have this query:

SELECT *
FROM (
{% include 'select_101/select_101.sql.j2' %}
)

The query will render and deploy as:

SELECT *
FROM (
   SELECT 101 as num
)

Happy modeling!

Want more stories? Check our blog, or follow Eben on Twitter.

Resources

https://github.com/ebendutoit/google-bigquery-data-modeller

Subscribe to updates, news and more.

Related blogs

Connect With Us