Skip to Content
CLICore ConceptsArchitecture

Architecture

CloudQuery works by connecting to your cloud providers and SaaS apps, extracting configuration data through their APIs, and loading it into a database you control. The CLI is the orchestrator - it manages this pipeline by coordinating independent components called integrations that each handle one part of the job.

This page covers how those components fit together. If you’re building a custom integration, understanding this architecture is especially relevant. For day-to-day usage, start with Configuration instead.

CloudQuery uses gRPC for communication between the CLI and integrations.

Data Flow

The CLI orchestrates all communication between integrations. Integrations do not talk to each other directly - every record flows through the CLI.

Source --> CLI --> Transformer 1 --> Transformer 2 --> ... --> Destination 1 ^ | Destination 2 | + adds _cq_sync_time, _cq_source_name, Destination N | _cq_sync_group_id to every record External APIs

Each connection between components uses gRPC. The transformer chain is optional and configured per destination - each destination can have its own transformers (or none). A single source can feed multiple destinations simultaneously without extra API calls.

During a sync:

  1. The source integration fetches data from external APIs and streams records to the CLI over gRPC.
  2. The CLI applies an internal transformation to each record, adding columns like _cq_sync_time, _cq_source_name, and _cq_sync_group_id.
  3. If transformer integrations are configured for a destination, records pass through them sequentially. Each transformer receives records from the previous one and forwards its output to the next. The order matches the transformers list in the destination spec.
  4. The destination integration receives the final records and writes them to the target database or storage.

CloudQuery CLI Responsibilities

  • Main entry point and CLI for the user.
  • Reading CloudQuery configuration files.
  • Downloading, verifying, and running integrations.
  • Orchestrating the sync pipeline between source, transformer, and destination integrations.
  • Applying internal transformations (adding _cq_* columns to every record).

CloudQuery Integration Responsibilities

  • Intended to be run only by the CloudQuery CLI.
  • Communicates with the CLI over gRPC to receive commands and stream data.
  • Source integrations: initialization, authentication, and fetching data via third-party cloud/SaaS APIs.
  • Destination integrations: authentication, schema migration, and data insertion.
  • Transformer integrations: receiving records, applying transformations, and forwarding results.

SDK

CloudQuery integrations use the Integration SDK, which abstracts most of the TL (in ETL, extract-transform-load). As a developer, you only need to implement the “E” (extract) - initializing, authentication, and fetching data via the third-party APIs. The SDK takes care of transforming the data and loading it into the destination.

Next Steps

Last updated on