engineering
integration
tutorial
Building an AWS Cloud Asset Inventory
Joe Karlsson •
Are you experiencing challenges in managing your AWS infrastructure? Do you need help keeping track of all your cloud assets, ensuring compliance, and maintaining security? If these issues sound familiar, then it's time to consider implementing a cloud asset inventory.
Managing assets in cloud environments like AWS is becoming increasingly complex. As organizations scale their cloud infrastructure, keeping track of all resources, ensuring compliance, and maintaining robust security measures become more challenging. The dynamic nature of cloud environments, with frequent changes and additions, further complicates asset management. This is where a comprehensive cloud asset inventory becomes indispensable, providing a clear and organized view of all cloud resources, helping to streamline operations and mitigate potential risks.
In this tutorial, you will build a cloud asset manager for AWS using CloudQuery. You’ll connect to your AWS account, collect data on all your cloud assets, and store it in a PostgreSQL database for analysis and reporting. However, with CloudQuery, you can extract data from ANY data source (Azure, GCP, etc.) and load it into ANY data destination (Snowflake, BigQuery, DataBricks, DuckDB, Clickhouse, etc.).
What is a Cloud Asset Inventory? #
The core of any Cloud Asset Management solution is a Cloud Asset Inventory. Fundamentally, a Cloud Asset Inventory is a centralized database of all the cloud assets you’re paying for.
A Cloud Asset Inventory is constructed by collecting information from the various Cloud Platform APIs (e.g. AWS, Google Cloud, Azure, etc.) and storing it in an accessible format, such as a SQL database. You can learn more about Cloud Asset Inventories here: https://www.cloudquery.io/blog/what-is-a-cloud-asset-inventory
Why Do You Need a Cloud Asset Inventory? #
A Cloud Asset Inventory enables engineers to identify, preempt, and mitigate a wide array of issues while simultaneously allowing engineers to perform risk and impact assessments quickly. They also enable otherwise complicated and expensive requests, such as identifying the source of a cost spike (in both simple single-cloud/single-account and complicated multi-cloud/multi-account platforms). Cloud Asset Inventories can help with:
- Improved Security: By maintaining a comprehensive inventory of cloud assets, organizations can quickly identify and address security vulnerabilities. This proactive approach helps in mitigating potential threats before they escalate.
- Compliance: Ensuring that all cloud resources comply with regulatory and internal policies is crucial. A cloud asset inventory helps in tracking compliance status and generating audit-ready reports.
- Cost Management: Understanding where and how resources are being utilized allows for better cost management. Organizations can identify underutilized assets, optimize resource allocation, and reduce unnecessary expenditures.
- Operational Efficiency: Streamlining the management of cloud resources enhances operational efficiency. Automated tracking and reporting reduce the manual effort required for asset management, allowing engineers to focus on more strategic tasks.
Why Build a Cloud Asset Inventory with CloudQuery? #
In the past decade, managing cloud assets has become increasingly complex. Traditional vertical solutions often fall short because they offer a limited schema that doesn’t accommodate the unique questions and needs of each organization. We have found that this inflexibility forces companies to build their own asset inventory systems on top of their existing data warehouses.
CloudQuery addresses these challenges by providing a highly customizable and powerful solution for building cloud asset inventories. Unlike rigid, vertical solutions, CloudQuery allows organizations to leverage their existing data warehouses and BI tools, enabling them to derive deeper insights and integrate infrastructure data with other business data. This flexibility not only enhances data analysis but also reduces costs and leverages the expertise of the existing tech team.
A similar trend can be seen in the marketing space with the rise of composable CDPs (Customer Data Platforms), which have advanced beyond traditional CDPs by offering more customization and integration capabilities. Just as marketing technologies have evolved to meet complex data needs, CloudQuery offers a sophisticated approach to cloud asset management, providing organizations with the tools to build a tailored asset inventory that fits their specific requirements.
Building an AWS Cloud Asset Inventory #
Let’s break down the tech stack and architecture of this project, and I’ll explain why we’re using each component.
- CloudQuery
- This tool helps us sync cloud asset data from AWS into any database, in this case, we are using PostgreSQL.
- Why: CloudQuery is a super powerful and flexible data movement platform that allows you to sync from any data source into any data source.
- PostgreSQL (DB)
- This is our database where we’ll store all the collected cloud asset data.
- Why: PostgreSQL is a strong, open-source relational database. It’s great for handling complex queries and ensuring data integrity, which is exactly what we need for structured data storage and querying.
- dbt (Data Build Tool)
- dbt is used to transform and model the data stored in PostgreSQL.
- Why: With DBT, we can apply transformations directly within the data warehouse. This makes our data pipelines more efficient and maintainable, which is a huge win for keeping everything organized and running smoothly.
While CloudQuery can run locally as a binary or in Docker, for convenience, we’ll be running CloudQuery locally and the other services in Docker. This setup helps streamline the process and keeps the environment consistent. By these tools together, you’re creating a powerful setup that can collect, store, transform, and visualize our cloud asset data efficiently and effectively.
It’s important to note that you can re-use your current data and infrastructure stack of data warehouse and business intelligence (BI) tools when you use CloudQuery. You can sync data from any data source and persist them into any database. We find that this flexibility helps you derive even more insights from your infrastructure data. As well as helping you save costs, by allowing you to reuse the data tech stack that your data team is already familiar with.
Prerequisites #
Before we get started, make sure you have a basic understanding of AWS, and how to use the command-line interface. You'll also need an AWS account with the necessary permissions. You will also need Docker installed on your machine. Docker is needed for Postgres and Grafana as CloudQuery doesn't require Docker and can run on any machine and architecture as a single compiled binary.
Setting Up Your Environment #
To keep things simple, you’re going to build a local development environment using Docker Compose. Using Docker to run PostgreSQL, DBT, and other services, while running CloudQuery locally as a self-constrained binary executable. It allows you to take advantage of Docker’s ease of use for container management while leveraging the full capabilities of CloudQuery directly on your local machine.
Not interested in deploying and maintaining your CloudQuery infrastructure? You might be interested in trying out CloudQuery Cloud, so be sure to join our waitlist.
The complete set of all the code used in this tutorial can be found at the bottom of this post.
Getting Started With CloudQuery #
To get started with CloudQuery, you will need to download the CloudQuery CLI.
Note: If you have any questions or encounter an issue when following along with this post, the best place to get help is to join the CloudQuery Discord.
Setting Up a Docker Container for PostgreSQL as the Data Store for AWS Cloud Asset Data #
To set up a local PostgreSQL database using Docker, you can run the following command:
docker run --name postgres_container \
--restart unless-stopped \
--env POSTGRES_USER=postgres \
--env POSTGRES_PASSWORD=postgres \
--env POSTGRES_HOST=db \
--env POSTGRES_DB=asset_inventory \
--publish 5432:5432 \
--volume pgdata:/var/lib/postgresql/data \
postgres
This command pulls the PostgreSQL image (version 15) from Docker Hub, sets the password for the Postgres user to postgres, and runs the container in detached mode (-d). The -p 5432:5432 option maps the container’s port 5432 (PostgreSQL’s default port) to port 5432 on the host machine, allowing you to connect to the database locally.
Note: While PostgreSQL is used in this example, any compatible database can be used as the data store for your AWS Cloud Asset data. PostgreSQL is chosen for its robustness and widespread adoption, but you can configure your setup to use another database system if preferred.
How to Sync AWS Cloud Assets with PostgreSQL with CloudQuery #
Now that you have the Docker Compose file ready, it’s time to write the CloudQuery config file. You can pick up the basic configuration for our chosen cloud platforms from the CloudQuery Hub.
How to pull Cloud Asset data from AWS #
For this tutorial, you’ll be using the CloudQuery AWS plugin. In the contents menu on the left-hand side, you’ll see Configuration. If you click that, it’ll bring you down to the basic example. Copy that into a new file called
config.yml
.kind: source
spec:
# Source spec section
name: aws
path: cloudquery/aws
registry: cloudquery
version: 'v26.9.0'
tables:
[
'aws_apigateway_rest_api_stages',
'aws_apigatewayv2_api_stages',
'aws_apigatewayv2_api_routes',
'aws_autoscaling_groups',
'aws_codebuild_projects',
'aws_config_configuration_recorders',
'aws_cloudwatch_alarms',
'aws_cloudtrail_trail_event_selectors',
'aws_cloudwatchlogs_metric_filters',
'aws_cloudfront_distributions',
'aws_iam_accounts',
'aws_iam_credential_reports',
'aws_iam_password_policies',
'aws_iam_users',
'aws_ec2_network_acls',
'aws_ec2_security_groups',
'aws_efs_access_points',
'aws_elasticbeanstalk_environments',
'aws_elbv1_load_balancers',
'aws_elbv2_load_balancers',
'aws_rds_clusters',
'aws_sns_subscriptions',
'aws_s3_accounts',
]
destinations: ['postgresql']
spec:
This CloudQuery configuration file sets up a data source from AWS to extract information from various AWS services, such as API Gateway, IAM, and RDS, using specific tables. The extracted data is then directed to a PostgreSQL database for storage. This setup allows for efficient data extraction, transformation, and storage, enabling easier analysis and visualization of AWS data.
For the
spec>tables
- be sure to include all the assets you want to sync with your Cloud Asset Inventory.Note: If you are interested in building a multi-cloud asset inventory, you can pull assets from any cloud provider, including Azure and GCP, using CloudQuery.
Setting AWS Environmental Variables #
AWS plugin can use the credentials from the
AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
, and AWS_SESSION_TOKEN
environment variables (AWS_SESSION_TOKEN
can be optional for some accounts). For information on obtaining credentials, see the AWS guide. You can set your environmental variables for CloudQuery with the following commands:export AWS_ACCESS_KEY_ID={Your AWS Access Key ID}
export AWS_SECRET_ACCESS_KEY={Your AWS secret access key}
export AWS_SESSION_TOKEN={Your AWS session token}
These can all be retrieved by following the AWS documentation.
How to save AWS Cloud Assets in PostgreSQL #
Next, you’ll need a destination plugin, so head back to the CloudQuery Hub, click Explore, and then Destinations. For this example, you’ll be using PostgreSQL, so find that using the search or by scrolling down the list. However, you can sync your AWS data to any other destination, and if your database isn’t there, you can build your own custom plugin! At the bottom of the config file, place a new line that contains --- and paste in the example config for the Postgres plugin.
Which should look something like this:
kind: destination
spec:
name: 'postgresql'
path: 'cloudquery/postgresql'
registry: 'cloudquery'
version: 'v8.0.8'
spec:
connection_string: 'postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOST}:5432/${POSTGRES_DB}?sslmode=disable'
And with that, the CloudQuery Config is ready. Now is a good time to test it out. Make sure your Postgres Docker container is running, then we need to run the CloudQuery job so that it syncs your AWS assets into Postgres with:
cloudquery sync ./config.yml
Now you can connect to Postgres and explore the data. This is an example query you can use to ensure your data has been correctly synced.
docker exec -it xxx /bin/bash
psql -U postgres
select * from aws_ec2_instances;
How to Use dbt to Transform AWS Data into a Cloud Asset Inventory #
dbt (Data Build Tool) is used here to transform your raw AWS data into structured tables. These tables are then ready to be consumed by visualization tools for easier data interpretation and analysis. This process is fully customizable, allowing you to tailor the transformations to fit your specific AWS configuration and requirements.
To simplify data transformations, CloudQuery provides several pre-built DBT projects, including security and compliance frameworks like PCI_DSS, CIS, and Foundational Security Best Practices. But for this tutorial, you will be using our prebuilt AWS Asset Inventory transformation. Here’s how you set up your DBT Transformations:
Go to the AWS Asset Inventory pack, and download and extract the contents into your project folder.
Finally, you need to define the
dbt-profiles.yml
file itself in your project directory:config:
send_anonymous_usage_stats: False
use_colors: True
aws_asset_inventory:
target: postgres
outputs:
postgres:
type: postgres
host: "{{ env_var('POSTGRES_HOST') }}"
user: "{{ env_var('POSTGRES_USER') }}"
pass: "{{ env_var('POSTGRES_PASSWORD') }}"
port: 5432
dbname: "{{ env_var('POSTGRES_DB') }}"
schema: public
threads: 1
To run dbt with Docker, you can use this Docker CLI command to set up the environment and execute DBT commands.
docker run --platform linux/amd64 --name dbt_container \
--env POSTGRES_USER=postgres \
--env POSTGRES_PASSWORD=postgres \
--env POSTGRES_HOST=db \
--env POSTGRES_DB=asset_inventory \
--volume $(pwd)/cloudquery_transformation_aws-asset-inventory_vX.X.X:/usr/app \
--volume $(pwd)/dbt-profiles.yml:/root/.dbt/profiles.yml \
ghcr.io/dbt-labs/dbt-postgres:1.8.1 run
Note: If you’re copying this sample directly into your Docker Compose file, make sure you set the version number to match the one you’ve downloaded.
What Happens When You Run This Command?
• Docker pulls the specified DBT image from GitHub Container Registry.
• A new container starts, named dbt_container, with the specified environment variables.
• Local directories and files are mapped to directories and files inside the container, making your DBT project and configuration available to DBT.
• DBT runs the dbt run command inside the container, which processes your data models and executes them against the connected PostgreSQL database.
Running Grafana to visualize your AWS Cloud Asset Data #
Grafana is used to visualize and monitor the transformed AWS data stored in the PostgreSQL database. It allows you to create interactive and customizable dashboards to make sense of your data, providing insights through charts, graphs, and alerts.
Note: While Grafana is used here, you can use any visualization platform that suits your needs, such as Tableau, Power BI, or Kibana, depending on your specific requirements and preferences.
You can set up your Grafana instance by running the following command:
docker run --name grafana_container \
--restart unless-stopped \
--platform linux/amd64 \
--volume grafana:/var/lib/grafana \
--publish 3000:3000 \
grafana/grafana
Feeding your AWS Cloud Asset Inventory Data into Grafana for Visualization #
Using Grafana to visualize your cloud asset inventory data not only helps in monitoring compliance and security but also enhances your ability to present critical information in a clear and actionable manner. This setup ensures that you can maintain a high level of oversight and continually improve your cloud asset inventory.
Now, let’s set up a dynamic dashboard in Grafana that pulls information from our PostgreSQL database.
- Select and Download Pre-Built Dashboards
- Visit the CloudQuery Hub and browse through the range of pre-built dashboards.
- Since you are working with AWS in this tutorial, select the AWS Asset Inventory visualization pack.
- Click “Download Now” and extract the zip file to your project folder.
- Access Grafana
- The Grafana instance launched by our Docker Compose setup should be accessible at http://localhost:3000.
- If you haven’t done so already, you may need to set a password for the admin account before proceeding.
- Note: The default Grafana username is admin and the default password is admin.
- Import the Dashboard
- In Grafana, open the hamburger menu (three horizontal lines) in the top left corner and select “Dashboards”.
- Click the blue “New” button on the top right, then select “Import”.
- Navigate to the extracted zip file directory and locate aws_asset_inventory/grafana/asset_inventory.json.
- Drag this file into the “Upload dashboard JSON file” area and click “Import”.
- This will load your dashboard.
- Note: You might need to add the data source in Grafana to connect to your PostgreSQL database.
Using a pre-built dashboard is a great start, but one of the key benefits of Grafana is its flexibility to create custom visualizations that meet your specific needs.
Conclusion #
In this tutorial, you walked through the process of building a cloud asset inventory for AWS using CloudQuery. Here’s a quick recap of what you achieved:
- Setting up CloudQuery: You configured CloudQuery to connect to your AWS account and gather detailed asset data.
- Storing Data in PostgreSQL: You set up a PostgreSQL database to store the collected asset data, enabling efficient querying and analysis.
- Transforming Data with DBT: You utilized DBT to apply data transformations, enhancing the quality and usability of your cloud asset inventory.
- Visualizing Data in Grafana: You imported pre-built dashboards into Grafana, allowing you to visualize and monitor your AWS assets dynamically.
By using CloudQuery, you can ensure that your asset inventory is comprehensive, adaptable, and integrated with your broader data strategy. This empowers your team to gain better insights and make informed decisions, ultimately driving more value from your cloud infrastructure.
Ready to dive deeper? Join the CloudQuery Discord community to connect with other users and experts. You can also try out CloudQuery locally with our quick start guide or explore CloudQuery Cloud (currently in beta) for a more scalable solution.
Thank you for following along, and we hope this guide helps you effectively manage your AWS cloud assets!
Additional Resources #
FAQs #
Q: What is a Cloud Asset Inventory?
A: A Cloud Asset Inventory is a centralized database that tracks all cloud resources and assets within an organization’s cloud environment, such as AWS. It helps in monitoring, managing, and securing these assets effectively.
Q: Why do I need a Cloud Asset Inventory for my AWS environment?
A: An asset inventory helps improve security, ensure compliance, manage costs, and enhance operational efficiency by providing a comprehensive view of all cloud resources.
Q: What is CloudQuery and how does it help with Cloud Asset Inventory?
A: CloudQuery is a tool that collects and normalizes cloud infrastructure data from AWS and other cloud providers. It helps in building a cloud asset inventory by gathering detailed information about cloud resources.
Q: How do I set up CloudQuery to gather AWS asset data?
A: You need to configure CloudQuery with your AWS credentials and specify the resources you want to inventory. This setup involves creating a configuration file and running CloudQuery to sync data from AWS.
Q: What are the prerequisites for setting up a Cloud Asset Inventory with CloudQuery?
A: Basic knowledge of AWS, SQL, and command-line interface is required. You also need an AWS account with the necessary permissions and PostgreSQL installed locally.
Q: How do I store collected asset data in PostgreSQL?
A: CloudQuery can be configured to export collected data directly into a PostgreSQL database. This involves setting up PostgreSQL credentials and specifying the target database schema.
Q: How do I use DBT to transform AWS asset data into a Cloud Asset Inventory?
A: You can use pre-built DBT projects provided by CloudQuery to apply data transformations. This involves downloading the transformation pack, updating the Docker Compose file, and running DBT commands.
Q: How does CloudQuery address the limitations of traditional asset inventory solutions?
A: CloudQuery provides a highly customizable solution that allows organizations to build their own asset inventories using their existing data warehouses and BI tools. This flexibility enables deeper insights, better integration with existing data, and reduces costs.
Q: Can CloudQuery work with existing data warehouse and BI tools?
A: Yes, CloudQuery can integrate seamlessly with your current stack of data warehouse and BI tools. This allows you to drive more insights by combining infrastructure data with other business data and leveraging the expertise of your existing tech team.
Q: What is an infrastructure data lake and how does it relate to CloudQuery?
A: An infrastructure data lake is a concept where all infrastructure-related data is collected and stored in a centralized repository for analysis. CloudQuery supports this idea by enabling organizations to gather and analyze comprehensive infrastructure data within their data warehouses.
Code #
CloudQuery #
config.yml
kind: source
spec:
# Source spec section
name: aws
path: cloudquery/aws
registry: cloudquery
version: 'v26.9.0'
tables:
[
'aws_apigateway_rest_api_stages',
'aws_apigatewayv2_api_stages',
'aws_apigatewayv2_api_routes',
'aws_autoscaling_groups',
'aws_codebuild_projects',
'aws_config_configuration_recorders',
'aws_cloudwatch_alarms',
'aws_cloudtrail_trail_event_selectors',
'aws_cloudwatchlogs_metric_filters',
'aws_cloudfront_distributions',
'aws_iam_accounts',
'aws_iam_credential_reports',
'aws_iam_password_policies',
'aws_iam_users',
'aws_ec2_network_acls',
'aws_ec2_security_groups',
'aws_efs_access_points',
'aws_elasticbeanstalk_environments',
'aws_elbv1_load_balancers',
'aws_elbv2_load_balancers',
'aws_rds_clusters',
'aws_sns_subscriptions',
'aws_s3_accounts',
]
destinations: ['postgresql']
spec:
---
kind: destination
spec:
name: 'postgresql'
path: 'cloudquery/postgresql'
registry: 'cloudquery'
version: 'v8.0.8'
spec:
connection_string: 'postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOST}:5432/${POSTGRES_DB}?sslmode=disable'
DBT #
dbt-profiles.yml
config:
send_anonymous_usage_stats: False
use_colors: True
aws_asset_inventory:
target: postgres
outputs:
postgres:
type: postgres
host: "{{ env_var('POSTGRES_HOST') }}"
user: "{{ env_var('POSTGRES_USER') }}"
pass: "{{ env_var('POSTGRES_PASSWORD') }}"
port: 5432
dbname: "{{ env_var('POSTGRES_DB') }}"
schema: public
threads: 1