Skip to Content

Data Model

When CloudQuery Platform syncs data from your cloud providers, the data lands in ClickHouse tables organized by integration and resource type. The platform then builds normalized views on top of this raw data, giving you both detailed per-resource tables and a unified cross-cloud inventory.

How Tables Are Organized

Each source integration creates one table per resource type. For example, the AWS integration creates tables like aws_ec2_instances, aws_s3_buckets, and aws_iam_roles. The Azure integration creates azure_compute_virtual_machines, azure_storage_accounts, and so on.

During a sync, data is first written to staging tables prefixed with raw_ (e.g. raw_aws_ec2_instances). Once a table’s sync completes, the platform creates a view with the original name (aws_ec2_instances) that points to the latest complete snapshot. This means the data you query is always consistent - it switches atomically when a sync finishes, rather than showing partial results mid-sync.

You can query these per-resource tables directly in the SQL Console using standard ClickHouse SQL. For the full technical details on how staging tables, views, and snapshots work, see Understanding Platform Views.

The Cloud Assets Table

The platform also maintains a unified table called cloud_assets that normalizes resources from all your integrations into a common schema. This is what powers the Asset Inventory, and it’s what you query when you need to search across clouds.

Every resource in cloud_assets has these columns:

ColumnDescription
cloudCloud provider (aws, azure, gcp, k8s, etc.)
accountAccount or subscription ID
account_nameHuman-readable account name
nameResource name
regionCloud region or location
resource_typeSource table name (e.g. aws_ec2_instances)
resource_type_labelHuman-readable resource type label
resource_categoryCategory like Compute, Storage, Database, Networking
tagsResource tags as a Map(String, String)
supports_tagsWhether this resource type supports tagging

Cost columns are also available when AWS Cost & Usage data is synced:

ColumnDescription
cost_7d_unblended7-day raw cost (before discounts and credits)
cost_30d_unblended30-day raw cost (before discounts and credits)

The cloud_assets table is a good starting point for cross-cloud queries. For example, finding all resources in a specific account across all clouds:

SELECT cloud, resource_type, name, region FROM cloud_assets WHERE account = '123456789012' ORDER BY cloud, resource_type

When you need the full resource details (like an EC2 instance’s security groups or an S3 bucket’s encryption settings), query the integration-specific table directly.

System Columns

Every table in CloudQuery Platform includes system columns prefixed with _cq_. These track sync metadata and enable the platform’s deduplication and consistency features.

ColumnTypeDescription
_cq_idUUIDUnique identifier for each resource record
_cq_parent_idUUID (can be null)Parent resource ID, for hierarchical relationships (e.g. a subnet belonging to a VPC)
_cq_sync_timeDateTime64Timestamp of when this record was synced
_cq_source_nameStringName of the source integration (e.g. aws, gcp)
_cq_sync_group_idUInt64Groups all records from a single sync run together

You don’t normally need to query these columns directly, but they’re useful for debugging sync issues or understanding when data was last refreshed.

Custom Columns

You can extend the cloud_assets schema with custom columns - computed fields defined by ClickHouse SQL expressions. These are useful for extracting values from tags, normalizing data, or adding organization-specific metadata.

For example, you could create a custom column that extracts a team owner from tags:

tags['team']

Custom columns appear alongside native columns in the Asset Inventory and are indexed for search.

Next Steps

Last updated on