Skip to Content
CLIIntegrationsSources

Source Integrations

Source integrations are responsible for extracting and transforming data from various APIs, cloud providers, SaaS applications, and databases. They define the schema (tables) and handle authentication with the supported services.

Browse All Source Integrations

For a complete list of available source integrations with detailed documentation, visit the CloudQuery Hub.

Cloud Providers

  • AWS: EC2, S3, RDS, Lambda, and 200+ other services
  • Google Cloud: Compute Engine, Cloud Storage, BigQuery, and more
  • Azure: Virtual Machines, Storage Accounts, SQL Database, and more
  • DigitalOcean: Droplets, Spaces, Load Balancers, and more

SaaS Applications

  • GitHub: Repositories, issues, pull requests, and user data
  • GitLab: Projects, merge requests, pipelines, and more
  • Slack: Channels, messages, users, and workspace data
  • Salesforce: Accounts, contacts, opportunities, and custom objects

Databases & APIs

  • PostgreSQL: Database schemas, tables, and metadata
  • MongoDB: Collections, documents, and database information
  • REST APIs: Generic REST API integration for any HTTP service

Source Configuration Reference

Basic Configuration

Source integrations are configured in your CloudQuery configuration file. Each source requires:

  • Name: Unique identifier for the source
  • Path: Plugin path (e.g., cloudquery/aws)
  • Version: Plugin version to use
  • Tables: Which tables to sync (supports wildcards)
  • Destinations: Where to send the data

Example configuration:

kind: source spec: name: aws path: cloudquery/aws registry: cloudquery version: "v33.18.0" tables: ["aws_s3_buckets", "aws_ec2_instances"] destinations: ["postgresql"] spec: # AWS-specific configuration regions: ["us-east-1", "us-west-2"]

Complete Source Spec Reference

Following are all available options for the top level source integration spec object.

For individual integration configuration, see the relevant integration page on CloudQuery Hub (e.g. AWS integration configuration).

name

(string, required)

Name of the integration. If you have multiple source integrations, this must be unique.

The name field may be used to uniquely identify a particular source configuration. For example, if you have two configs for the AWS plugin for syncing different accounts, one may be named aws-account-1 and the other aws-account-2. In this case, the path option below must be used to specify the download path for the plugin.

registry

(string, optional, default: cloudquery, available: github, cloudquery, local, grpc, docker)

  • cloudquery: CloudQuery will look for and download the plugin from the official CloudQuery registry, and then execute it.
  • github: Deprecated. CloudQuery will look for and download the plugin from GitHub, and then execute it.
  • local: CloudQuery will execute the plugin from a local path.
  • grpc: mostly useful in debug mode when plugin is already running in a different terminal, CloudQuery will connect to the gRPC plugin server directly without spawning the process.
  • docker: CloudQuery will run the plugin in a Docker container. This is most useful for plugins written in Python, as they do not support the local, github and cloudquery registries.

docker_registry_auth_token

(string, optional, default: "", introduced in CLI v5.7.0)

Authentication token for private Docker container registries. This is required if the plugin is hosted in a private Docker container registry. The token should be a valid Docker registry token that can be used to pull the plugin image. This option is only relevant when registry is set to docker. The token is a base64 encoded string. Here is an example of how to generate the token:

echo -n "{\"username\":\"<REPLACE_WITH_PASSWORD>\",\"password\":\"<REPLACE_WITH_PASSWORD>\"}" | base64`

Details about specific private container registries:

AWS ECR: The username is AWS and you can get the password by running aws ecr get-login-password --region <region>. Replace <region> with the region where the ECR is located.

Generating the token for AWS ECR would look like this:

echo -n "{\"username\":\"AWS\",\"password\":\"$(aws ecr get-login-password --region <REGION>)\"}" | base64

GitHub Container Registry: The username is USERNAME and you use a personal access token as the password. More information can be found here

Generating the token for GitHub Container Registry would look like this:

export CR_PAT=YOUR_TOKEN echo -n "{\"username\":\"USERNAME\",\"password\":\"$CR_PAT\"}" | base64

path

(string, required)

Configures how to retrieve the plugin. The contents depend on the value of registry (cloudquery by default).

  • For plugins hosted on the CloudQuery registry, path should be of the form "<team>/<plugin-name>". For official plugins, should be cloudquery/<plugin-name>.
  • For plugins hosted on GitHub, path should be of the form "<org>/<repository>".
  • For plugins that are located in the local filesystem, path should be a filesystem path to the plugin binary.
  • To connect to a running plugin via grpc (mostly useful for debugging), path should be the host-port of the plugin (e.g. localhost:7777).
  • For plugins distributed via Docker, path should be the name of the Docker image (optionally including a tag, the same as you would use for docker run, e.g. ghcr.io/cloudquery/cq-source-typeform:v1.0.0).

version

(string, required)

version must be a valid SemVer, e.g. vMajor.Minor.Patch. You can find all official plugin versions under cloudquery/cloudquery/releases, and for community integrations you can find it in the relevant community repository. Required for integrations using the cloudquery or github registries.

tables

([]string, required)

This option was changed to required in versions >= v3.0.0 of the CloudQuery CLI. In previous versions it was optional and defaulted to ["*"] (sync all tables).

Tables to sync from the source plugin. It accepts wildcards. For example, to match all tables use ["*"] and to match all EC2-related tables use aws_ec2_*. Syncing all tables can be slow on some integrations (e.g. AWS, GCP, Azure).

Prior to CLI version v6.0.0, matched tables would also sync all their descendant tables, unless these were skipped in skip_tables. You must now explicitly list descendant tables in tables to sync them or set skip_dependent_tables to false.

skip_tables

([]string, optional, default: [])

Specify which tables to skip when syncing the source plugin. It accepts wildcards. This config is useful when using wildcards in tables, or when you wish to skip dependent tables. If a table with dependencies is skipped, all its dependent tables will also be skipped.

skip_dependent_tables

(bool, optional, default: true, introduced in CLI v2.3.7)

If set to false, dependent tables will be included in the sync when their parents are matched, even if not explicitly included by the tables configuration. Prior to CLI version v6.0.0, this option defaulted to false. We’ve changed the default to true to avoid new tables implicitly being synced when added to plugins.

destinations

([]string, required)

Specify the names of the destinations to sync the data of the source plugin to.

deterministic_cq_id

(bool, optional, default: false, introduced in CLI v2.4.1)

A flag that indicates whether the value of _cq_id should be a UUID that is a hash of the primary keys or a random UUID. If a resource has no primary keys defined the value will always be a random UUID.

Supported by source plugins released on 2023-03-08 and later

backend_options

(object, optional)

Configures the state backend for incremental table syncs. Contains two required sub-fields:

  • table_name (string, required): The name of the table used to store key-value pairs for incremental sync progress.
  • connection (string, required): Connection reference for the destination integration that stores the state. Typically uses the @@plugins.<plugin-name>.connection variable syntax to reference another integration’s connection.

Example:

backend_options: table_name: cq_state_aws connection: "@@plugins.postgresql.connection"

See Managing Incremental Tables for more details.

otel_endpoint (preview)

(string, optional, introduced in CLI v3.10.0)

OpenTelemetry OTLP/HTTP exporter. Also, supports Jaeger endpoint. This will send traces of syncs to that endpoint.

otel_endpoint_insecure (preview)

(bool, optional, default: false, introduced in CLI v3.10.0)

If set to true, the exporter will not verify the server will connect via http instead of https.

spec

(object, optional)

Integration-specific configurations. Visit source integrations documentation for more information.

Deprecated Options

concurrency

This option was deprecated in CLI v3.6.0 in favor of plugin level concurrency, as each integration has its own concurrency requirements. See more in each integration’s documentation.

scheduler

This option was deprecated in CLI v3.6.0 in favor of plugin level scheduler, as each integration has its own scheduler requirements. See more in each integration’s documentation.

backend

This option was deprecated in CLI v3.6.0 in favor of backend_options. See Managing Incremental Tables for more information.

backend_spec

This option was deprecated in CLI v3.6.0 in favor of backend_options. See Managing Incremental Tables for more information.

Authentication

Each source integration handles authentication differently:

  • Cloud Providers: Use IAM roles, service accounts, or API keys
  • SaaS Applications: OAuth tokens, API keys, or personal access tokens
  • Databases: Connection strings with credentials

Refer to each integration’s documentation for specific authentication requirements.

Performance Considerations

  • Table Selection: Use specific table names instead of wildcards when possible
  • Rate Limiting: Some APIs have rate limits that affect sync performance
  • Incremental Syncs: Many sources support incremental syncing for better performance
  • Concurrency: Adjust concurrency settings based on API limits

Creating Custom Sources

Need a source that doesn’t exist? Learn how to create your own source integration.

Next Steps

Last updated on