tutorial

Search Through Your Cloud Infrastructure with CloudQuery and Elasticsearch

Herman Schaaf

Herman Schaaf Feb 10, 2023

Search your AWS Infrastructure with Elasticsearch

Introduction

Have you ever wished you could enter the ID of an EC2 instance into the search bar and get a list of all the related resources? Or search by tag across all your accounts and regions? Or search for resources by keyword? With the new CloudQuery Elasticsearch destination, you can do all this, and more.
Using the Elasticsearch destination, you can search through your cloud infrastructure, create visualizations and dashboards, and generally explore your infrastructure data in a way that wasn't possible before. In this article we will show you how to get started and a few examples of the things you can do.

Why Elasticsearch?

Elasticsearch is a popular open source search engine built on top of Lucene. It is a distributed, RESTful search and analytics engine that is capable of storing and searching terabytes of data. As datasets grow over time, managing a SQL database can become more and more difficult, but Elasticsearch offers a great alternative here. It is schemaless and has built-in support for scaling, sharding and replication. Its full text search capabilities also make it a great fit for searching through unstructured data, especially JSON columns.

Getting Started

First, you will need to have CloudQuery installed. If you haven't already, you can do so by following the instructions in the Quickstart guide.
You will also need a running Elasticsearch instance. If you don't have one, you can either create one locally using Docker, or use the Elastic Cloud service, which offers a free trial. This docker-compose file will get you started with local Elasticsearch and Kibana services (download the file, then run docker compose up). Once it's up, you should be able to visit localhost:5601 and see the Kibana interface.

Configuring the Elasticsearch Destination

Now that you have CloudQuery and Elasticsearch running, you can configure the Elasticsearch destination. To do this, follow the instructions in the Elasticsearch destination plugin docs to create a configuration file called elasticsearch.yml.
Assuming you want to browse AWS infrastructure data, you also need to create a configuration file for the AWS provider. Again, follow the instructions in the AWS Source guide to create a configuration file called aws.yml.
Finally, you can run the CloudQuery command to sync your AWS infrastructure data to Elasticsearch:
cloudquery sync aws.yml elasticsearch.yml

Setting up the Data Views in Kibana

When the sync completes, head over to the "Stack Management" section of Kibana and click on "Data Views" under "Kibana". Click "Create data view" and enter aws* as the index pattern. Select _cq_sync_time as the time field (you can change this later, if you want). Click "Save data view to Kibana" to finish.
Create a data view in Kibana

Searching Through Your Cloud Infrastructure

Now that you have your data in Elasticsearch, you can start searching through it. To do this, head over to the "Discover" section of Kibana and enter a search query. For example, we can search by IP address:
Search by IP address
Or by tags:
Search by tag
Or we can use the native JSON object support to search through nested JSON objects. This query will find EC2 instances that have their monitoring state set to "disabled":
Search by JSON object
We can also build visualizations and dashboards using the data in Elasticsearch. For example, we can create a pie chart that shows the distribution of EC2 instance types:
EC2 instance type pie chart
(Note: To create this particular visualization with the current version of the AWS and Elasticsearch plugins, we had to change the type of the instance_type field from text to keyword in the "Index Patterns" section of Kibana. This can be done by first doing a migration, which creates the index templates, editing the template, and then running a new sync.)

Building a Historical View of Your Infrastructure

One of the great things about Elasticsearch is that it is designed to be used with time series data, and therefore a final thing we wanted to note in this post is the ability to view historical snapshots of your infrastructure from Elasticsearch. The CloudQuery Elasticsearch destination currently supports three write modes: overwrite, overwrite-delete-stale (the default) and append. You can read more about these in the plugin documentation, but in short, when using append mode, CloudQuery will never delete data, so you can build a historical view of your infrastructure over time. For example, by running a sync in append mode every day, you can build a visualization that shows the number of EC2 instances over time, or more generally track how (or whether!) your organization's security posture is improving over time.

Conclusion

In this article we showed you how to get started with the new CloudQuery Elasticsearch destination, now available in Preview. We also showed you a few examples of the things you can do with the data in Elasticsearch in combination with the AWS source plugin, including searching through your cloud infrastructure, building visualizations and dashboards, and building a historical view of your infrastructure. But don't think that it's limited to AWS! The Elasticsearch destination works with all CloudQuery source plugins, so you can search through your Azure, GCP, Kubernetes, or even Marketing and FinOps data as well. We hope you find this new destination useful, and we look forward to seeing what you build with it! If you have any feedback on the Elasticsearch destination, or any other part of CloudQuery, please let us know on Discord or by opening an issue on GitHub. 🚀
Subscribe to product updates

Be the first to know about new features.