announcement

Announcing Open Beta of CloudQuery AWS Plugin with Event-based Sync

Event-based sync for AWS is now in open beta

Michal Brutvan

Michal Brutvan Jan 15, 2024

Header Image
The Event-based sync for AWS Plugin is now in open beta!

What is it?

CloudQuery is designed to sync all data on demand, giving the user full control over what to sync, when, and how often. This approach makes CloudQuery easy to set up and use. However, syncing all data has its tradeoffs. Most notable is the sync duration: in some use cases, in order to detect a single change in a resource one must run a full sync that can take a few hours or more.
Additionally, regular sync, even if it runs every hour, is sometimes just not enough to get the accurate picture of what is happening in your environment. Cloud environments are ephemeral - they come and go in just a few minutes and it is really hard to track them and get your costs right. An IP address gets spammed by bots the moment you make it public. User accounts get created with broad permissions and can get misused in a brief moment.
This is where our new event-based sync comes to the rescue.

How it works

All events are aggregated by AWS CloudTrail. You can configure a Trail to send management events to a Kinesis Data Stream via CloudWatch Logs. By subscribing to a stream of AWS CloudTrail events in the Kinesis Data stream, CloudQuery can then trigger selective syncs to update just the singular resource that had a configuration change.
Configuring CloudQuery AWS Plugin for event-based sync Configuring CloudQuery AWS Plugin for event-based sync
With this setup, you get the fresh data within a few seconds of it becoming available in CloudTrail.
At the moment, the event-based sync supports the following services and their selected events:
  • EC2
  • IAM
  • RDS
Find the full list of supported events in our AWS Event-based Sync documentation.

Getting Started

  1. Configure an AWS CloudTrail Trail to send management events to a Kinesis Data Stream via CloudWatch Logs. The most straight forward way to do this is to use the CloudQuery provided CloudFormation template.
aws cloudformation deploy --template-file ./streaming-deployment.yml --stack-name <STACK-NAME> --capabilities CAPABILITY_IAM --disable-rollback --region <DESIRED-REGION>
  1. Copy the ARN of the Kinesis stream. If you used the CloudFormation template you can run the following command:
aws cloudformation describe-stacks --stack-name <STACK-NAME> --query "Stacks[].Outputs" --region <DESIRED-REGION>
  1. Define a config.yml file like the one below
kind: source
spec:
  name: "aws-event-based"
  registry: cloudquery
  path: cloudquery/aws
  tables:
    - aws_ec2_instances
    - aws_ec2_internet_gateways
    - aws_ec2_security_groups
    - aws_ec2_subnets
    - aws_ec2_vpcs
    - aws_ecs_cluster_tasks
    - aws_iam_groups
    - aws_iam_roles
    - aws_iam_users
    - aws_rds_instances
destinations: ["postgresql"]
  skip_tables:
    - aws_iam_group_last_accessed_details
    - aws_iam_role_last_accessed_details
    - aws_iam_user_last_accessed_details
  spec:
    event_based_sync:
      - account:
          local_profile: "<ROLE-NAME>"
      kinesis_stream_arn: "<OUTPUT-FROM-CLOUDFORMATION-STACK>"
  1. Log in with CloudQuery CLI
You may need to sign up first.
cloudquery login
  1. Sync the data!
cloudquery sync config.yml
This will start a long lived process that will only stop when there is an error or you stop the process.

Deploying in production

To make sure CloudQuery CLI runs authenticated, use an API key.
CloudQuery needs to run in a listening mode as a long-running service. In this mode, it does not support the overwrite-delete-stale write model. To delete stale data, you need to set up a recurrent task to run full table syncs. Additionally, you may need to set up another task with CloudQuery still running regular sync on tables that are currently not supported for the event-based sync. See the AWS Plugin documentation for the list of supported tables.
Note that these are the limitations of the current beta version of the event-based sync for our AWS plugin. We plan to make configuration and management easier in the future based on user feedback.

Availability

This feature is currently available to everyone using CloudQuery AWS plugin. To use it, you need to log in with the CLI.

Future work

At the moment, only one Kinesis stream is supported by a running instance of CloudQuery. We will consider adding support for multiple streams based on the feedback we receive.
The current coverage of tables has been designed to provide a selection of different services. We will add more resources based on your feedback.
Subscribe to product updates

Be the first to know about new features.