engineering
integration
tutorial

How to build a Cloud Asset Inventory for Azure

Joe Karlsson

Joe Karlsson

Are you experiencing challenges in managing your Azure infrastructure? Do you need help keeping track of all your cloud assets, ensuring compliance, and maintaining security? If these issues sound familiar, then it's time to consider implementing a cloud asset inventory.
Managing assets in cloud environments such as Azure is growing increasingly complex. As organizations expand their cloud infrastructure, tracking resources, ensuring compliance, and maintaining strong security measures become more challenging. The dynamic nature of cloud environments, with frequent changes and additions, complicates asset management further. Therefore, a comprehensive cloud asset inventory is essential. It offers a clear and organized view of all cloud resources, streamlining operations and mitigating potential risks.
In this tutorial, you will build a cloud asset manager for Azure using CloudQuery. You’ll connect to your Azure account, collect data on all your cloud assets, and store it in a PostgreSQL database for analysis and reporting. However, with CloudQuery, you can extract data from ANY data source (AWS, GCP, etc.) and load it into ANY data destination (Snowflake, BigQuery, Databricks, DuckDB, Clickhouse, etc.).
Looking to build a Cloud Asset Inventory for your AWS data? Check our tutorial, Building an AWS Cloud Asset Inventory.

Building a Cloud Asset Inventory for your Azure Resources #

Let’s break down the tech stack and architecture of this project, and I’ll explain why we’re using each component.
  • CloudQuery
    • This tool helps us sync cloud asset data from Azure into any database, in this case, we are using PostgreSQL.
    • Why: CloudQuery is a super powerful and flexible data movement platform that allows you to sync from any data source into any data source.
  • PostgreSQL (DB)
    • This is our database where we’ll store all the collected cloud asset data.
    • Why: PostgreSQL is a strong, open-source relational database. It’s great for handling complex queries and ensuring data integrity, which is exactly what we need for structured data storage and querying.
  • dbt (Data Build Tool)
    • dbt is used to transform and model the data stored in PostgreSQL.
    • Why: With dbt, we can apply transformations directly within the data warehouse. This makes our data pipelines more efficient and maintainable, which is a huge win for keeping everything organized and running smoothly.
While CloudQuery can run locally as a binary or in Docker, for convenience, we’ll be running CloudQuery locally and the other services in Docker. This setup helps streamline the process and keeps the environment consistent. By these tools together, you’re creating a powerful setup that can collect, store, transform, and visualize our cloud asset data efficiently and effectively.
It’s important to note that you can re-use your current data and infrastructure stack of data warehouse and business intelligence (BI) tools when you use CloudQuery. You can sync data from any data source and persist them into any database. We find that this flexibility helps you derive even more insights from your infrastructure data. As well as helping you save costs, by allowing you to reuse the data tech stack that your data team is already familiar with.

Prerequisites #

Before we get started, make sure you have a basic understanding of Azure, and how to use the command-line interface. You'll also need an Azure account with the necessary permissions. You will also need Docker installed on your machine. Docker is needed for Postgres and dbt as CloudQuery doesn't require Docker and can run on any machine and architecture as a single compiled binary.

Setting Up Your Environment #

To keep things simple, you’re going to build a local development environment using Docker Compose. Using Docker to run PostgreSQL, dbt, and other services, while running CloudQuery locally as a self-constrained binary executable. It allows you to take advantage of Docker’s ease of use for container management while leveraging the full capabilities of CloudQuery directly on your local machine.
Not interested in deploying and maintaining your CloudQuery infrastructure? You might be interested in trying out CloudQuery Cloud, so be sure to join our wait list.
The complete set of all the code used in this tutorial can be found at the bottom of this post.

Getting Started With CloudQuery #

To get started with CloudQuery, you will need to download the CloudQuery CLI.
*Note: If you have any questions or encounter an issue when following along with this post, the best place to get help is to join the CloudQuery Discord.*

Setting Up a Docker Container for PostgreSQL as the Data Store for Azure Cloud Asset Data #

To set up a local PostgreSQL database using Docker, you can run the following command:
docker run --name postgres_container \
--restart unless-stopped \
--env POSTGRES_USER=postgres \
--env POSTGRES_PASSWORD=postgres \
--env POSTGRES_HOST=db \
--env POSTGRES_DB=asset_inventory \
--publish 5432:5432 \
--volume pgdata:/var/lib/postgresql/data \
postgres
This command pulls the PostgreSQL image (version 15) from Docker Hub, sets the password for the postgres user to postgres, and runs the container in detached mode (-d). The -p 5432:5432 option maps the container’s port 5432 (PostgreSQL’s default port) to port 5432 on the host machine, allowing you to connect to the database locally.
Note: While PostgreSQL is used in this example, any compatible database can be used as the data store for your Azure Cloud Asset data. PostgreSQL is chosen for its robustness and widespread adoption, but you can configure your setup to use another database system if preferred.

How to Sync Azure Cloud Assets to PostgreSQL with CloudQuery #

The CloudQuery Azure source plugin extracts information from many of the supported services by Microsoft Azure and loads it into any supported CloudQuery destination (e.g. PostgreSQL, BigQuery, Snowflake, and more).
It’s time to write the CloudQuery configuration file to connect to your Azure data. You can pick up the basic configuration for our chosen cloud platforms from the CloudQuery Hub.

How to pull Cloud Asset data from Azure #

For this tutorial, you’ll be using the CloudQuery Azure plugin. Copy that into a new file called config.yml.
kind: source
spec:
  # Source spec section
  name: 'azure'
  path: 'cloudquery/azure'
  registry: 'cloudquery'
  version: 'v13.3.2'
  destinations: ['postgresql']
  tables: ['*']
This CloudQuery configuration file sets up a data source from Azure to extract information from various Azure services, such as API Gateway, IAM, and RDS, using specific tables. The extracted data is then directed to a PostgreSQL database for storage. This setup allows for efficient data extraction, transformation, and storage, enabling easier analysis and visualization of Azure data.
For the spec>tables - be sure to include all the assets you want to sync with your Cloud Asset Inventory.
Note: If you are interested in building a multi-cloud asset inventory, you can pull assets from any cloud provider, including AWS and GCP, using CloudQuery.

How to Authenticate and Connect to your Azure Data #

First, install the Azure CLI. Then, login with the Azure CLI:
az login
WARNING: Using only Azure CLI login is not recommended for production use, as it requires spawning a new Azure CLI process each time an authentication token is needed. Refer to our documentation on how to authenticate to Azure using Environmental Variables.

How to save Azure Cloud Assets in PostgreSQL #

Next, you’ll need a destination plugin, so head back to the CloudQuery Hub, click Explore, and then Destinations. For this example, you’ll be using PostgreSQL, so find that using the search or by scrolling down the list. However, you can sync your Azure data to any other destination, and if your database isn’t there, you can build your own custom plugin! At the bottom of the configuration file, place a new line that contains --- and paste in the example configuration for the Postgres plugin.
Which should look something like this:
kind: destination
spec:
  name: "postgresql"
  path: "cloudquery/postgresql"
  registry: "cloudquery"
  version: "v8.0.8"
  spec:
    connection_string: "postgresql://postgres:postgres}@db:5432/${POSTGRES_DB}?sslmode=disable"
And with that, the CloudQuery Config is ready. Now is a good time to test it out. Make sure your Postgres Docker container is running, then we need to run the CloudQuery job so that it syncs your Azure assets into Postgres with:
cloudquery sync ./config.yml
Now you can connect to Postgres and explore the data. For example, you can use this is an example query you can use to ensure your data has been correctly synced. This query finds all the storage accounts that are allowing non-HTTPS traffic:
docker exec -it postgres_container /bin/bash

psql -U postgres

SELECT * from azure_storage_accounts where enable_https_traffic_only = false;

How to Use dbt to Transform Azure Data into a Cloud Asset Inventory #

dbt (Data Build Tool) is used here to transform your raw Azure data into structured tables. These tables are then ready to be consumed by visualization tools for easier data interpretation and analysis. This process is fully customizable, allowing you to tailor the transformations to fit your specific Azure configuration and requirements.
To simplify data transformations, CloudQuery provides several pre-built dbt projects, including security and compliance frameworks like PCI_DSS, CIS, and Foundational Security Best Practices. But for this tutorial, you will be using our prebuilt Azure Asset Inventory transformation. Here’s how you set up your dbt Transformations:
Go to the Azure Asset Inventory pack, and download and extract the contents into your project folder.
Finally, you need to define the dbt-profiles.yml file itself in your project directory:
config:
  send_anonymous_usage_stats: False
  use_colors: True

azure_asset_inventory:
  target: postgres
  outputs:
    postgres:
      type: postgres
      host: "{{ env_var('POSTGRES_HOST') }}"
      user: "{{ env_var('POSTGRES_USER') }}"
      pass: "{{ env_var('POSTGRES_PASSWORD') }}"
      port: 5432
      dbname: "{{ env_var('POSTGRES_DB') }}"
      schema: public
      threads: 1
To run dbt with Docker, you can use this Docker CLI command to set up the environment and execute dbt commands.
docker run --platform linux/amd64 --name dbt_container \
  --env POSTGRES_USER=postgres \
  --env POSTGRES_PASSWORD=postgres \
  --env POSTGRES_HOST=db \
  --env POSTGRES_DB=asset_inventory \
  --volume $(pwd)/cloudquery_transformation_azure-asset-inventory_vX.X.X:/usr/app \
  --volume $(pwd)/dbt-profiles.yml:/root/.dbt/profiles.yml \
  ghcr.io/dbt-labs/dbt-postgres:1.8.1 run
Note: If you’re copying this sample directly into your Docker Compose file, make sure you set the version number to match the one you’ve downloaded.
What Happens When You Run This Command?
  • Docker pulls the specified dbt image from GitHub Container Registry.
  • A new container starts, named dbt_container, with the specified environment variables.
  • Local directories and files are mapped to directories and files inside the container, making your dbt project and configuration available to dbt.
  • dbt runs the dbt run command inside the container, which processes your data models and executes them against the connected PostgreSQL database.
You can now query your new tables to find additional data about your cloud, like, how many resources are there per subscription?
select subscription_id, count(*)
from azure_resources
group by subscription_id
order by count(*) desc

Summary #

In this tutorial, you walked through the process of building a cloud asset inventory for Azure using CloudQuery. Here’s a quick recap of what you achieved:
  • Setting up CloudQuery: You configured CloudQuery to connect to your Azure account and gather detailed asset data.
  • Storing Data in PostgreSQL: You set up a PostgreSQL database to store the collected asset data, enabling efficient querying and analysis.
  • Transforming Data with dbt: You utilized dbt to apply data transformations, enhancing the quality and usability of your cloud asset inventory.
By using CloudQuery, you can ensure that your asset inventory is comprehensive, adaptable, and integrated with your broader data strategy. This empowers your team to gain better insights and make informed decisions, ultimately driving more value from your cloud infrastructure.
Ready to dive deeper? Join the CloudQuery Discord community to connect with other users and experts. You can also try out CloudQuery locally with our quick start guide or explore CloudQuery Cloud (currently in beta) for a more scalable solution.
Thank you for following along, and we hope this guide helps you effectively manage your Azure cloud assets!

Additional Resources #

Code #

CloudQuery #

config.yml
kind: source
spec:
  # Source spec section
  name: "azure"
  path: "cloudquery/azure"
  registry: "cloudquery"
  version: "v13.3.2"
  destinations: ["postgresql"]
  tables: ["*"]
---
kind: destination
spec:
  name: "postgresql"
  path: "cloudquery/postgresql"
  registry: "cloudquery"
  version: "v8.0.8"
  spec:
    connection_string: "postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOST}:5432/${POSTGRES_DB}?sslmode=disable"

dbt #

dbt-profiles.yml
config:
  send_anonymous_usage_stats: False
  use_colors: True

azure_asset_inventory:
  target: postgres
  outputs:
    postgres:
      type: postgres
      host: "{{ env_var('POSTGRES_HOST') }}"
      user: "{{ env_var('POSTGRES_USER') }}"
      pass: "{{ env_var('POSTGRES_PASSWORD') }}"
      port: 5432
      dbname: "{{ env_var('POSTGRES_DB') }}"
      schema: public
      threads: 1
Subscribe to product updates

Be the first to know about new features.