Architecture
CloudQuery works by connecting to your cloud providers and SaaS apps, extracting configuration data through their APIs, and loading it into a database you control. The CLI is the orchestrator - it manages this pipeline by coordinating independent components called integrations that each handle one part of the job.
This page covers how those components fit together. If you’re building a custom integration, understanding this architecture is especially relevant. For day-to-day usage, start with Configuration instead.
CloudQuery uses gRPC for communication between the CLI and integrations.
Data Flow
The CLI orchestrates all communication between integrations. Integrations do not talk to each other directly - every record flows through the CLI.
Source --> CLI --> Transformer 1 --> Transformer 2 --> ... --> Destination 1
^ | Destination 2
| + adds _cq_sync_time, _cq_source_name, Destination N
| _cq_sync_group_id to every record
External
APIsEach connection between components uses gRPC. The transformer chain is optional and configured per destination - each destination can have its own transformers (or none). A single source can feed multiple destinations simultaneously without extra API calls.
During a sync:
- The source integration fetches data from external APIs and streams records to the CLI over gRPC.
- The CLI applies an internal transformation to each record, adding columns like
_cq_sync_time,_cq_source_name, and_cq_sync_group_id. - If transformer integrations are configured for a destination, records pass through them sequentially. Each transformer receives records from the previous one and forwards its output to the next. The order matches the
transformerslist in the destination spec. - The destination integration receives the final records and writes them to the target database or storage.
CloudQuery CLI Responsibilities
- Main entry point and CLI for the user.
- Reading CloudQuery configuration files.
- Downloading, verifying, and running integrations.
- Orchestrating the sync pipeline between source, transformer, and destination integrations.
- Applying internal transformations (adding
_cq_*columns to every record).
CloudQuery Integration Responsibilities
- Intended to be run only by the CloudQuery CLI.
- Communicates with the CLI over gRPC to receive commands and stream data.
- Source integrations: initialization, authentication, and fetching data via third-party cloud/SaaS APIs.
- Destination integrations: authentication, schema migration, and data insertion.
- Transformer integrations: receiving records, applying transformations, and forwarding results.
SDK
CloudQuery integrations use the Integration SDK, which abstracts most of the TL (in ETL, extract-transform-load). As a developer, you only need to implement the “E” (extract) - initializing, authentication, and fetching data via the third-party APIs. The SDK takes care of transforming the data and loading it into the destination.
Next Steps
- Integrations - learn about the three integration types and their responsibilities
- Syncs - understand how syncs work, including write modes and incremental tables
- Configuration - set up your first source and destination configuration
- Creating a new integration - build your own source, destination, or transformer integration
- Publishing integrations to the hub - share your integration with the community
- Integration SDK documentation - SDK API reference