Announcing Open Beta of CloudQuery AWS Plugin with Event-based Sync

CloudQuery News

Announcing Open Beta of CloudQuery AWS Plugin with Event-based Sync

•

The Event-based sync for AWS Plugin is now in open beta!

What is it? #

CloudQuery is designed to sync all data on demand, giving the user full control over what to sync, when, and how often. This approach makes CloudQuery easy to set up and use. However, syncing all data has its tradeoffs. Most notable is the sync duration: in some use cases, in order to detect a single change in a resource one must run a full sync that can take a few hours or more.

Additionally, regular sync, even if it runs every hour, is sometimes just not enough to get the accurate picture of what is happening in your environment. Cloud environments are ephemeral - they come and go in just a few minutes and it can be really hard to track them and get a complete understanding of your costs. An IP address gets spammed by bots the moment you make it public. User accounts get created with broad permissions and can get misused in a brief moment.

This is where our new event-based sync comes to the rescue.

How it works #

All events are aggregated by AWS CloudTrail. You can configure a Trail to send management events to a Kinesis Data Stream via CloudWatch Logs. By subscribing to a stream of AWS CloudTrail events in the Kinesis Data stream, CloudQuery can then trigger selective syncs to update just the singular resource that had a configuration change.

Configuring CloudQuery AWS Plugin for event-based sync

With this setup, you get the fresh data within a few seconds of it becoming available in CloudTrail.

At the moment, the event-based sync supports the following services and their selected events:

Find the full list of supported events in our AWS Event-based Sync documentation.

Getting Started #

Configure an AWS CloudTrail Trail to send management events to a Kinesis Data Stream via CloudWatch Logs. The most straight forward way to do this is to use the CloudQuery provided CloudFormation template.

aws cloudformation deploy --template-file ./streaming-deployment.yml --stack-name <STACK-NAME> --capabilities CAPABILITY_IAM --disable-rollback --region <DESIRED-REGION>

Copy the ARN of the Kinesis stream. If you used the CloudFormation template you can run the following command:

aws cloudformation describe-stacks --stack-name <STACK-NAME> --query "Stacks[].Outputs" --region <DESIRED-REGION>

Define a config.yml file like the one below

kind: source
spec:
  name: "aws-event-based"
  registry: cloudquery
  path: cloudquery/aws
  tables:
    - aws_ec2_instances
    - aws_ec2_internet_gateways
    - aws_ec2_security_groups
    - aws_ec2_subnets
    - aws_ec2_vpcs
    - aws_ecs_cluster_tasks
    - aws_iam_groups
    - aws_iam_roles
    - aws_iam_users
    - aws_rds_instances
destinations: ["postgresql"]
  skip_tables:
    - aws_iam_group_last_accessed_details
    - aws_iam_role_last_accessed_details
    - aws_iam_user_last_accessed_details
  spec:
    event_based_sync:
      - account:
          local_profile: "<ROLE-NAME>"
      kinesis_stream_arn: "<OUTPUT-FROM-CLOUDFORMATION-STACK>"

You may need to sign up to CloudQuery first.

cloudquery login

Sync the data!

cloudquery sync config.yml

This will start a long lived process that will only stop when there is an error or you stop the process.

Deploying in production #

To make sure CloudQuery CLI runs authenticated, use an API key.

CloudQuery needs to run in a listening mode as a long-running service. In this mode, it does not support the overwrite-delete-stale write model. To delete stale data, you need to set up a recurrent task to run full table syncs. Additionally, you may need to set up another task with CloudQuery still running regular sync on tables that are currently not supported for the event-based sync. See the CloudQuery AWS integration documentation for the list of supported tables.

Note that these are the limitations of the current beta version of the event-based sync for our AWS source integration. We plan to make configuration and management easier in the future based on user feedback.

Availability #

This feature is currently available to everyone using CloudQuery AWS source integration. To use it, you need to log in with the CLI.

Future work #

At the moment, only one Kinesis stream is supported by a running instance of CloudQuery. We will consider adding support for multiple streams based on the feedback we receive. Checkout the CloudQuery Community for updates and discussion as we continue to work on these additions.

The current coverage of tables has been designed to provide a selection of different services. We will add more resources based on your feedback.

Read the docs.

Ready to dive deeper? Contact CloudQuery here or join the CloudQuery Community to connect with other users and experts. You can also try out CloudQuery locally with our quick start guide or explore the CloudQuery Platform (currently in beta) for a more scalable solution.

Want help getting started? Join the CloudQuery community to connect with other users and experts, or message our team directly here if you have any questions.

CloudQuery