announcement

Announcing BigQuery Source Plugin

Michal Brutvan

Michal Brutvan

Introducing the new CloudQuery BigQuery source plugin!
BigQuery is a popular, fully managed, serverless data warehouse designed for large datasets. We have now added a source plugin for BigQuery that enables syncing tables and views from a dataset to any supported CloudQuery destination.
Data teams often use BigQuery for its scalability, ease of use, and integration with other Google Cloud services. They utilize it for various data analytics tasks such as ad-hoc queries, data exploration, and large-scale data processing. However, sometimes you need to do an advanced analysis, run transformations, or consolidate the data in a different database. The new source plugin enables just that.
On top of syncing the individual tables and views from the source dataset, the new plugin supports queries at the source, thus enabling on-the-fly transformation of the data.

Example Config

Set up CloudQuery locally, or use CloudQuery Cloud.
To sync a whole dataset to a PostgreSQL database, use the default configuration. Specify your Project ID, Dataset ID (name), and the connection string to the database. For other options and details on how to authenticate with BigQuery, see BigQuery Source Plugin Documentation.
kind: source
spec:
  name: bigquery
  path: cloudquery/bigquery
  registry: cloudquery
  version: "v1.0.0"
  tables: ["*"]
  destinations: ["postgresql"]
  spec:
    project_id: ${PROJECT_ID}
    dataset_id: ${DATASET_ID}
---
kind: destination
spec:
  name: "postgresql"
  path: "cloudquery/postgresql"
  registry: "cloudquery"
  version: "v7.6.0"
  spec:
    connection_string: ${POSTGRESQL_CONNECTION_STRING}

Specifying a query

In this example, assume a dataset usage with table usage_events having columns customer_id, timestamp, and miles. The individual entries represent a distance driven by a customer during a single driving session started at the timestamp.
We would like to get only the summary of the distance driven last month grouped by the customer and sync it to a PostgreSQL database. Here's what the config looks like with the SQL query included:
kind: source
spec:
  name: bigquery
  path: cloudquery/bigquery
  registry: cloudquery
  version: "v1.0.0"
  tables: ["*"]
  destinations: ["postgresql"]
  spec:
    project_id: ${PROJECT_ID}
    dataset_id: ${DATASET_ID}
    queries:
      - name: monthly_usage
      - query: >
          select customer_id, sum(miles) as usage from billing.usage_events
          where timestamp > current_timestamp - interval 30 day
          group by customer_id
---
kind: destination
spec:
  name: "postgresql"
  path: "cloudquery/postgresql"
  registry: "cloudquery"
  version: "v7.6.0"
  spec:
    connection_string: ${POSTGRESQL_CONNECTION_STRING}
This config will not sync the entire dataset. Instead, it will create only the monthly_usage table and populate it with the results of the query.
You can include as many queries as you want in one sync config.

Find more resources in Hub

With this plugin, you can sync data from BigQuery to a destination of your choice. However, you can get data from more sources than just BigQuery. Check out our other database sources on Hub.
Subscribe to product updates

Be the first to know about new features.