I have noticed an increasing amount of poor practices floating around regarding connecting #BigQuery with Looker/Data Studio that could cost you.
My top suggestions:
- utilize _table_suffix
- build reporting tables
- BI Engine
- Materialized Views
- Caching
Details in 🧵
1/5 When working with sharded tables take advantage of the _table_suffix. It will potentially save you from querying a column and querying multiple tables.
Good: SELECT _table_suffix as date, userId, location... From table_*
Bad: SELECT date, userId, location... FROM table_*
2/5 Reporting Tables provide a great step between raw data and viz tools. Especially when data does not get updated as often. In the BQ UI you can easily schedule their creation or use a combination of cloud logging, pub/sub and functions for event driven creation.
2*/5 Make sure to take advantage of partitioning and clustering when building viz tables. Even SELECT * LIMIT 10, will run better when tables are clustered.
3/5 BI Engine is an in-memory analysis service on top of BigQuery. It will cache data that is queried most frequently or data that you prefer cached. The size of the cache determines the monthly cost ~ $30 per GB of cache per month.
4/5 Materialized views are created as easily as the standard views, but precompute and cache the results of the query that defines them. Especially useful for streaming data since only the delta (difference from the last call) gets queried.
5/5 Looker Studio has native caching. It is important to be aware of what it's doing and how to control it. #BigQuery as a source can have data freshness set between 1 minute and 12 hours.