Harnessing the Power of BigQuery and CloudQuery for Google Cloud Cost Optimization

Tutorials

Harnessing the Power of BigQuery and CloudQuery for Google Cloud Cost Optimization

•

Introduction #

Last week we just announced our BigQuery destination and already saw some interesting use cases around cost we want to share, and also thank our great community helping with ideas for this blog!

Exporting GCP billing data to BigQuery is a powerful way to analyse your GCP cost. However, sometimes it is not enough to have the billing data in BigQuery. You also need to correlate it with your cloud infrastructure data. This tutorial will show how to correlate between GCP billing data and CloudQuery data with BigQuery to optimize cost.

Prerequisites #

GCP Billing Export to BigQuery is enabled
GCP Source Plugin
BigQuery Destination Plugin

Syncing data #

First, we need to sync our GCP data with CloudQuery to the same BigQuery dataset we synced our billing data to. To do that we will use the following CloudQuery configuration file (For full config reference, checkout the GCP Source Plugin and BigQuery Destination Plugin):

kind: source
spec:
  # Source spec section
  name: 'gcp'
  path: 'cloudquery/gcp'
  registry: 'cloudquery'
  version: 'v18.17.0'
  destinations: ['bigquery']
  tables: ['gcp_billing*']
  spec:
    # GCP Spec section described below
    project_ids: ['<project-id>']
---
kind: destination
spec:
  name: bigquery
  path: cloudquery/bigquery
  registry: 'cloudquery'
  version: 'v4.3.4'
  write_mode: 'append'
  spec:
    project_id: '<project-id>'
    dataset_id: costdata

Once the data is synced we will see something like the below in the BigQuery UI:

You can see all CloudQuery tables prefix with gcp_ and two GCP billing table prefixed with gcp_billing_export and gcp_billing_export_resource. In our case the interesting one is gcp_billing_export_resource which contains the billing data for each resource so we can join with CloudQuery data easily.

Correlating Billing Data with CloudQuery Data #

Now let's dive into some examples.

Cost of unattached disks #

Let's say we want to find all unattached disks and figure out how much they cost.

First let's find all unattached disks by query all disks with users field is null (which per gcp documentation is the case for unattached disks):

select * from gcp_compute_disks where users is null

Now let's join with the GCP billing data.

SELECT sum(cost) FROM `cq-playground.costdata.gcp_billing_export_resource_v1_0183D4_4E0A4D_60E401` gcp_billing_export_resource
  join `cq-playground.costdata.gcp_compute_disks` gcp_compute_disks on
  gcp_billing_export_resource.resource.name = gcp_compute_disks.name
WHERE DATE(_PARTITIONTIME) = "2022-12-14" and ARRAY_LENGTH(gcp_compute_disks.users) = 0

This query should give us total of all costs of unattached disks for a specific date, so we don't scan the whole historical table. We can also run this on a date range.

Cost of instances by architecture #

Let's say we are running an experiment of migrating our workloads to different architecture and we want to compare costs. This can be easily done by the following queries.

Calculate cost for Intel Broadwell CPU:

SELECT sum(cost) FROM `cq-playground.costdata.gcp_billing_export_resource_v1_0183D4_4E0A4D_60E401` gcp_billing_export_resource
  join `cq-playground.costdata.gcp_compute_instances` gcp_compute_instances on gcp_billing_export_resource.resource.name = gcp_compute_instances.name

WHERE DATE(_PARTITIONTIME) = "2022-12-14" and gcp_compute_instances.cpu_platform = "Intel Broadwell"

And you can run the same by replacing Intel Broadwell with AMD Rome and/or any other architecture.

Summary #

In this short blog post we just shared a sample of what you can do by combining cost data with your infrastructure state/metadata synced by CloudQuery to BigQuery. The number of use cases around cost (aka "FinOps" :) ) is really infinite and it all depends on what you are trying to achieve and optimize for. CloudQuery with BigQuery is a powerful tool for analysis which ensures cheap storage and fast querying on large amount of data.

Ready to dive deeper? Contact CloudQuery here or join the CloudQuery Community to connect with other users and experts. You can also try out CloudQuery locally with our quick start guide or explore the CloudQuery Platform (currently in beta) for a more scalable solution.

Want help getting started? Join the CloudQuery community to connect with other users and experts, or message our team directly here if you have any questions.

CloudQuery

Test CloudQuery's capabilities with a demo

Harnessing the Power of BigQuery and CloudQuery for Google Cloud Cost Optimization

Introduction #

Prerequisites #

Syncing data #

Correlating Billing Data with CloudQuery Data #

Cost of unattached disks #

Cost of instances by architecture #

Summary #

Related posts

Running Embedded ELT with CloudQuery and Snowpark Container Service