announcement
Announcing BigQuery Source Plugin
Michal Brutvan •
Introducing the new CloudQuery BigQuery source plugin!
BigQuery is a popular, fully managed, serverless data warehouse designed for large datasets. We have now added a source plugin for BigQuery that enables syncing tables and views from a dataset to any supported CloudQuery destination.
Data teams often use BigQuery for its scalability, ease of use, and integration with other Google Cloud services. They utilize it for various data analytics tasks such as ad-hoc queries, data exploration, and large-scale data processing. However, sometimes you need to do an advanced analysis, run transformations, or consolidate the data in a different database. The new source plugin enables just that.
On top of syncing the individual tables and views from the source dataset, the new plugin supports queries at the source, thus enabling on-the-fly transformation of the data.
Example Config
Set up CloudQuery locally, or use CloudQuery Cloud.
To sync a whole dataset to a PostgreSQL database, use the default configuration. Specify your Project ID, Dataset ID (name), and the connection string to the database. For other options and details on how to authenticate with BigQuery, see BigQuery Source Plugin Documentation.
kind: source
spec:
name: bigquery
path: cloudquery/bigquery
registry: cloudquery
version: "v1.0.0"
tables: ["*"]
destinations: ["postgresql"]
spec:
project_id: ${PROJECT_ID}
dataset_id: ${DATASET_ID}
---
kind: destination
spec:
name: "postgresql"
path: "cloudquery/postgresql"
registry: "cloudquery"
version: "v7.6.0"
spec:
connection_string: ${POSTGRESQL_CONNECTION_STRING}
Specifying a query
In this example, assume a dataset
usage
with table usage_events
having columns customer_id
, timestamp
, and miles
. The individual entries represent a distance driven by a customer during a single driving session started at the timestamp.We would like to get only the summary of the distance driven last month grouped by the customer and sync it to a PostgreSQL database. Here's what the config looks like with the SQL query included:
kind: source
spec:
name: bigquery
path: cloudquery/bigquery
registry: cloudquery
version: "v1.0.0"
tables: ["*"]
destinations: ["postgresql"]
spec:
project_id: ${PROJECT_ID}
dataset_id: ${DATASET_ID}
queries:
- name: monthly_usage
- query: >
select customer_id, sum(miles) as usage from billing.usage_events
where timestamp > current_timestamp - interval 30 day
group by customer_id
---
kind: destination
spec:
name: "postgresql"
path: "cloudquery/postgresql"
registry: "cloudquery"
version: "v7.6.0"
spec:
connection_string: ${POSTGRESQL_CONNECTION_STRING}
This config will not sync the entire dataset. Instead, it will create only the
monthly_usage
table and populate it with the results of the query.You can include as many queries as you want in one sync config.
Find more resources in Hub
With this plugin, you can sync data from BigQuery to a destination of your choice. However, you can get data from more sources than just BigQuery. Check out our other database sources on Hub.