Announcing the CloudQuery BigQuery Source Integration

CloudQuery News

Announcing the CloudQuery BigQuery Source Integration

•

Introducing the new CloudQuery BigQuery source plugin!

BigQuery is a popular, fully managed, serverless data warehouse designed for large datasets. We have now added a BigQuery Source Integration that enables syncing tables and views from a dataset to any supported CloudQuery destination.

Data teams often use BigQuery for its scalability, ease of use, and integration with other Google Cloud services. It is commonly used alongside our GCP Source Integration. These teams utilize BigQuery for various data analytics tasks such as ad-hoc queries, data exploration, and large-scale data processing. However, sometimes you need to do an advanced analysis, run transformations, or consolidate the data in a different database. The new source plugin enables just that.

On top of syncing the individual tables and views from the source dataset, the new plugin supports queries at the source, thus enabling on-the-fly transformation of the data.

Example Configuration #

Contact CloudQuery to set up your account.

To sync a whole dataset to a PostgreSQL database, use the default configuration. Specify your Project ID, Dataset ID (name), and the connection string to the database. For other options and details on how to authenticate with BigQuery, see BigQuery Source Integration Documentation.

kind: source
spec:
  name: bigquery
  path: cloudquery/bigquery
  registry: cloudquery
  version: 'v1.0.0'
  tables: ['*']
  destinations: ['postgresql']
  spec:
    project_id: ${PROJECT_ID}
    dataset_id: ${DATASET_ID}
---
kind: destination
spec:
  name: 'postgresql'
  path: 'cloudquery/postgresql'
  registry: 'cloudquery'
  version: 'v7.6.0'
  spec:
    connection_string: ${POSTGRESQL_CONNECTION_STRING}

Specifying a query #

In this example, assume a dataset usage with table usage_events having columns customer_id, timestamp, and miles. The individual entries represent a distance driven by a customer during a single driving session started at the timestamp.

We would like to get only the summary of the distance driven last month grouped by the customer and sync it to a PostgreSQL database. Here's what the configuration looks like with the SQL query included:

kind: source
spec:
  name: bigquery
  path: cloudquery/bigquery
  registry: cloudquery
  version: 'v1.0.0'
  tables: ['*']
  destinations: ['postgresql']
  spec:
    project_id: ${PROJECT_ID}
    dataset_id: ${DATASET_ID}
    queries:
      - name: monthly_usage
      - query: >
          select customer_id, sum(miles) as usage from billing.usage_events
          where timestamp > current_timestamp - interval 30 day
          group by customer_id
---
kind: destination
spec:
  name: 'postgresql'
  path: 'cloudquery/postgresql'
  registry: 'cloudquery'
  version: 'v7.6.0'
  spec:
    connection_string: ${POSTGRESQL_CONNECTION_STRING}

This configuration will not sync the entire dataset. Instead, it will create only the monthly_usage table and populate it with the results of the query.

You can include as many queries as you want in one sync configuration.

Find more resources in Hub #

With this plugin, you can sync data from BigQuery to a destination of your choice. However, you can get data from more sources than just BigQuery. Check out our other database source integrations on the CloudQuery Hub.

If you'd like to start using the BigQuery source integration, download CloudQuery today. You can also discuss use cases and get feedback from other CloudQuery users in the CloudQuery Community.

Ready to dive deeper? Contact CloudQuery here or join the CloudQuery Community to connect with other users and experts.

You can also try out CloudQuery locally with our CLI quick start guide or explore the CloudQuery Platform (currently in beta) for a more scalable solution.

CloudQuery

Test CloudQuery's capabilities with a demo

Announcing the CloudQuery BigQuery Source Integration

Example Configuration #

Specifying a query #

Find more resources in Hub #

Related posts

CloudQuery announces $16M funding round to Solve Enterprise Cloud Visibility