CloudQuery is joining env zero! We're moving from data to decisions.

Read the Announcement ❯

Read the Announcement ❯

Cloud Asset Inventory
Product News

A Field Guide to Finding Zombie Infrastructure in AWS

Joe Karlsson

Joe Karlsson

13 min read

There's a difference between a resource that costs too much and a resource that costs anything at all. Right-sizing an over-provisioned database is optimization. Paying $2,200 a month for a database with zero connections is waste - and no cost tool will tell you it's safe to turn off.
That's the zombie problem. Cost tools tell you what you're spending. They don't tell you whether anyone is using it, who created it, whether it's managed by Terraform, or whether something else depends on it. The answer to "should we shut this down?" requires connecting signals that live in different tools - and that connection is where most organizations get stuck.
PlaybookZombie PatternSignals to CorrelateMonthly CostAction
Abandoned databaseAbandonedZero connections + no encryption + public access~$2,200Verify no dependencies, then decommission
Forgotten prototypeShadowStale Lambda + public API Gateway + no DynamoDB backupsVariesAssess usage, assign owner, add monitoring
Unowned computeOrphanedNo Terraform + no owner tag + no sessions~$1,400Trace creator, assign owner or shut down

Key Takeaways #

  • Organizations waste an estimated 27% of cloud spend according to the Flexera 2025 State of the Cloud Report, and a significant portion is resources nobody remembers exist
  • Cost tools show what you spend but not whether anyone uses a resource, who owns it, or whether it's safe to shut down - that requires correlating cost with activity, ownership, and IaC coverage
  • CloudQuery syncs AWS data by calling APIs like DescribeDBInstances, ListFunctions, and DescribeInstances across all accounts, while the CUR integration adds resource-level cost data
  • Insights surfaces cost signals alongside security findings and IaC coverage gaps on each resource, so the decision to investigate or decommission is backed by the full context

What Makes a Resource a Zombie? #

Zombies aren't the same as unoptimized resources. An over-provisioned EC2 instance that's actively serving traffic is a right-sizing opportunity. A zombie is a resource that serves no current business purpose but continues to run and accumulate charges.
I've seen three common zombie patterns in the wild:
Abandoned: The resource was created for a legitimate purpose that ended - a load test environment, a proof-of-concept, a migration staging area. The project finished. The resource didn't.
Orphaned: The person who created it left the company, switched teams, or moved on. No handoff happened. The resource has no owner, no documentation, and no one who knows why it exists.
Shadow: The resource started as a prototype or experiment but became a production dependency without anyone noticing. It's actively used, but nobody maintains it, monitors it, or knows it's critical.
Each pattern requires a different response. Abandoned resources can be shut down (after verification). Orphaned resources need an owner assigned. Shadow resources need to be promoted to proper production status with monitoring, backups, and ownership. The first step for all three is finding them.

Playbook 1: How Do You Investigate an Abandoned Database? #

The scenario: An RDS instance with encryption disabled, publicly accessible endpoints, no database connections in 90 days, and $2,200/month in charges. That cost is plausible for a db.r6g.2xlarge in Multi-AZ deployment - roughly $2.08/hour for compute alone, plus storage.
Nobody's connecting to it. Nobody encrypted it. And it's publicly accessible on the internet.

The Manual Investigation #

Start with the instance configuration to understand what you're dealing with:
aws rds describe-db-instances --db-instance-identifier suspect-db \
  --query "DBInstances[].{Engine:Engine,Class:DBInstanceClass,MultiAZ:MultiAZ,Encrypted:StorageEncrypted,Public:PubliclyAccessible}"
You're looking for StorageEncrypted: false and PubliclyAccessible: true - those are the security findings. But the real question is whether anyone uses this database at all.
Check the DatabaseConnections CloudWatch metric over the past 90 days:
aws cloudwatch get-metric-statistics \
  --namespace AWS/RDS \
  --metric-name DatabaseConnections \
  --dimensions Name=DBInstanceIdentifier,Value=suspect-db \
  --start-time 2026-01-02T00:00:00Z \
  --end-time 2026-04-02T00:00:00Z \
  --period 86400 \
  --statistics Maximum
If the Maximum is 0 across 90 days, no application or user has connected to this database in three months.
Then verify the cost. Note that resource-level Cost Explorer data requires opt-in through the Cost Management console. If you have it enabled:
aws ce get-cost-and-usage-with-resources \
  --time-period Start=2026-03-01,End=2026-04-01 \
  --granularity DAILY \
  --filter '{"Dimensions":{"Key":"RESOURCE_ID","Values":["arn:aws:rds:us-east-1:ACCOUNT_ID:db:suspect-db"]}}' \
  --metrics "UnblendedCost"
If you don't have resource-level data enabled, AWS Cost and Usage Reports (CUR) provide the same granularity and are what the CloudQuery AWS CUR integration syncs.

The Decision Tree #

Zero connections for 90 days is a strong signal, but it doesn't always mean "safe to delete."
Before shutting it down, check a few things. Is it a disaster recovery standby that's designed to sit idle? Some teams keep warm standby databases that only activate during failover. Check if it's part of an RDS read replica chain or cross-region replication. Is it a compliance archive? Some industries require retaining database instances for audit purposes even when no applications are actively querying them. Check CloudTrail for the CreateDBInstance event to find who created it and when - that context often reveals the original purpose.
The hardest check: is it referenced in any application configuration? Connection strings might exist in environment variables, SSM Parameter Store, or Secrets Manager that nobody has cleaned up. If an application restart picks up that connection string, you'll learn the database was more important than the metrics suggested.

The SQL Approach #

With CloudQuery syncing both your AWS data and cost data through the AWS CUR integration, you can join them:
SELECT
    r.db_instance_identifier,
    r.db_instance_class,
    r.engine,
    r.storage_encrypted,
    r.publicly_accessible,
    r.account_id,
    r.region
FROM aws_rds_instances r
WHERE r.publicly_accessible = true
  AND r.storage_encrypted = false;
This query finds every RDS instance that's both publicly accessible and unencrypted. Cross-reference the results with your CUR cost data and CloudWatch connection metrics (synced through the AWS integration) to build the full picture: unencrypted, public, unused, and expensive.

What Insights Adds #

CloudQuery Insights surfaces the cost signal from AWS CUR alongside the security findings (no encryption, public access) from AWS Security Hub on a single resource view. The Evidence panel explains what was detected. The Mitigation panel provides remediation steps. And the Related Resources tab shows what else is connected to this database - security groups, subnets, parameter groups - so you can understand the full context before making a decision.
The compound picture - paying $2,200/month for a database that nobody connects to, that isn't encrypted, and that's exposed to the internet - makes the decision obvious in a way that any individual finding doesn't.

Playbook 2: How Do You Find the Prototype That Became Production? #

The scenario: A Lambda function triggered by a public API Gateway, writing to a DynamoDB table with no backup policy, last deployed 14 months ago by someone who left the company. One team's forgotten prototype is now an unmonitored production dependency.
This is the hardest zombie to handle because you can't assume it's safe to shut down.

The Manual Investigation #

Start with the Lambda function's deployment history:
aws lambda get-function --function-name suspect-function \
  --query "{Runtime:Configuration.Runtime,LastModified:Configuration.LastModified,Handler:Configuration.Handler}"
If LastModified is 14 months ago and the Runtime is approaching end-of-support (or already past it), you have a function that nobody has touched in over a year.
Check if the API Gateway endpoint is public:
aws apigateway get-rest-apis \
  --query "items[?name=='suspect-api'].{ID:id,Endpoint:endpointConfiguration.types}"
A REGIONAL or EDGE endpoint type without a custom authorization layer means it's publicly callable. Anyone with the URL can invoke this function.
Now check the DynamoDB table's backup status. Point-in-Time Recovery (PITR) is the minimum viable backup policy for production tables:
aws dynamodb describe-continuous-backups \
  --table-name suspect-table
If PointInTimeRecoveryStatus is DISABLED, there's no backup. If this table gets corrupted or deleted, the data is gone.

Why This Is the Hardest Zombie #

This is the scenario that keeps me up at night. Unlike the abandoned database, you can't look at a single metric to know if this is safe to decommission. A Lambda function might have irregular traffic patterns - triggered by a webhook, a monthly batch job, or another team's integration. Zero invocations today doesn't mean zero invocations next Tuesday.
Check the API Gateway access logs and Lambda invocation metrics first:
aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name Invocations \
  --dimensions Name=FunctionName,Value=suspect-function \
  --start-time 2026-01-02T00:00:00Z \
  --end-time 2026-04-02T00:00:00Z \
  --period 86400 \
  --statistics Sum
If it IS being invoked, you now have an unmonitored production dependency with no owner, no backups, a public attack surface, and a runtime that might be approaching deprecation. That's worse than a zombie - it's shadow infrastructure that's actively in use.
Trace the creator via CloudTrail. The CreateFunction event will show which IAM identity deployed it and when. Cross-reference that identity with your HR system or your Jira and GitHub integrations to determine if the person still works at your company.

The SQL Approach #

With CloudQuery, you can query across Lambda, API Gateway, and DynamoDB in a single place:
SELECT
    f.function_name,
    f.runtime,
    f.last_modified,
    f.role,
    f.account_id,
    f.region
FROM aws_lambda_functions f
WHERE f.last_modified < now() - INTERVAL 12 MONTH;
This finds every Lambda function that hasn't been updated in over a year. Join with API Gateway data to flag which of those stale functions have public triggers, and with DynamoDB backup status to identify which ones write to unprotected tables.

What Insights Adds #

Insights correlates the DynamoDB backup gap, the API Gateway public exposure, and the stale deployment date on a single resource view. The Related Resources tab shows the Lambda-API Gateway-DynamoDB chain, so you can see the full dependency graph without manually tracing each service.
When you create custom Policies, you can flag this pattern specifically: Lambda functions not modified in 12+ months with public API Gateway triggers and DynamoDB tables lacking PITR. That becomes a persistent finding that surfaces after every sync.

Playbook 3: How Do You Find the Unowned Compute Instance? #

The scenario: An EC2 instance costing $1,400/month (an r6i.8xlarge at $2.016/hour runs about $1,471/month on demand), no Terraform state, no owner tag, no recent SSH or SSM sessions.

The Manual Investigation #

Check Terraform state for the instance:
terraform state list | grep i-0abc123def456
No result means nobody provisioned it through your standard IaC workflow. Check for ownership tags:
aws ec2 describe-tags --filters \
  "Name=resource-id,Values=i-0abc123def456" \
  "Name=key,Values=Owner,Team,owner,team"
No ownership tags means nobody claimed responsibility for this resource through your tagging standard. Trace the creation event in CloudTrail:
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=ResourceName,AttributeValue=i-0abc123def456
The RunInstances event tells you who launched it, when, and from what IAM identity. If that identity belongs to someone who left six months ago and the instance has been running ever since, you've found your zombie.
Now multiply this across every AWS account in your organization. If you have 15 accounts, you're running describe-tags, lookup-events, and terraform state list in each one, switching credentials or profiles each time, and hoping you don't miss anything. Most teams I've talked to either skip this entirely or sample a handful of accounts and hope for the best. That's how $1,400/month zombies survive for years - nobody checks every account.

The SQL Approach #

When you set up the AWS integration in multi-account mode, CloudQuery discovers all accounts in your AWS Organization and syncs them into the same tables. Every resource gets an account_id column, so one query covers your entire estate. Combined with the Terraform integration, you can query IaC coverage gaps across your entire fleet:
SELECT
    i.instance_id,
    i.instance_type,
    i.account_id,
    i.region,
    i.tags
FROM aws_ec2_instances i
LEFT JOIN tf_resources tf
    ON i.instance_id = tf.id
WHERE tf.id IS NULL
  AND JSONExtractString(i.state, 'Name') = 'running';
This returns every running EC2 instance not tracked in Terraform. Sort by instance type to find the expensive ones first. An untracked t3.micro is less urgent than an untracked r6i.8xlarge.

What Insights Adds #

Insights treats the absence of IaC coverage as a signal worth surfacing - not missing data to ignore. Combined with cost data from AWS CUR and the lack of activity signals, the resource gets flagged with appropriate severity. The compound view - $1,400/month, no Terraform, no owner, no sessions - makes the risk clear.
You can also use CloudQuery's Resource Ownership feature to filter Insights by ownership tags. If your organization assigns resources to teams via tags, Insights lets you group findings by owner - making it obvious which teams have the most zombies.

How Do You Set Up Automated Zombie Detection? #

Three integrations give you the signals you need:
  1. AWS source - syncs instance configuration, security groups, Lambda functions, RDS instances, tags, and CloudTrail events
  2. AWS CUR source - syncs resource-level cost data with 30-day and 7-day trends and spike detection
  3. Terraform source - syncs Terraform state so you can identify IaC coverage gaps
Once these are connected, you can create a custom Policy that flags resources matching zombie patterns:
SELECT
    i.instance_id,
    i.instance_type,
    i.account_id
FROM aws_ec2_instances i
LEFT JOIN tf_resources tf ON i.instance_id = tf.id
WHERE tf.id IS NULL
  AND JSONExtractString(i.state, 'Name') = 'running'
  AND (i.tags IS NULL OR JSONExtractString(i.tags, 'Owner') = '');
This flags running EC2 instances with no Terraform state and no Owner tag. Save it as a Policy, and it generates findings in the Insights dashboard after every sync. Combine it with CUR cost data to prioritize the expensive zombies first.
Find Out What Your Zombie Infrastructure Is Costing You
Book a Demo

Frequently Asked Questions #

What Is Zombie Cloud Infrastructure? #

Zombie infrastructure refers to cloud resources that are still running and accumulating charges but no longer serve a business purpose. Unlike underutilized resources (which could benefit from right-sizing), zombies provide zero value. Common examples include databases with no active connections, compute instances launched for projects that ended months ago, and Lambda functions deployed by engineers who have since left the company.

How Much Do Unused Cloud Resources Typically Cost Organizations? #

The Flexera 2025 State of the Cloud Report found that organizations estimate 27% of their cloud spend is wasted. While that figure includes both underutilized and truly idle resources, zombie infrastructure - resources with zero usage - represents the most straightforward savings opportunity because there's no workload to migrate or resize.

How Can I Check If an AWS RDS Instance Has Active Connections? #

Query the DatabaseConnections CloudWatch metric for the instance over a 90-day window using aws cloudwatch get-metric-statistics. If the Maximum value is 0 across the entire period, no application or user has connected. Before decommissioning, verify the instance isn't a disaster recovery standby, compliance archive, or referenced in application configuration via SSM Parameter Store or Secrets Manager.

What Is the Difference Between Zombie and Orphaned Cloud Resources? #

A zombie resource provides no current business value but continues to run. An orphaned resource has lost its owner - the person or team responsible for it. All orphaned resources aren't necessarily zombies (they might still be actively used), and not all zombies are orphaned (a team might knowingly own a resource but not realize it's idle). The overlap - orphaned AND idle - is where the highest-confidence shutdown opportunities are.

How Does CloudQuery Identify Resources Not Managed by Terraform? #

The Terraform integration syncs your Terraform state files into queryable tables. By running a LEFT JOIN between your cloud resource tables (like aws_ec2_instances) and tf_resources, you can find instances where the join returns NULL - meaning the resource exists in your cloud account but doesn't appear in any Terraform state file. These unmanaged resources often lack standard governance controls.

Can CloudQuery Show Who Originally Created a Cloud Resource? #

Yes, through CloudTrail integration. CloudQuery syncs CloudTrail events, including RunInstances, CreateDBInstance, CreateFunction, and other resource creation events. Each event includes the userIdentity field showing which IAM principal performed the action. Cross-reference this with your Jira or GitHub integrations to connect IAM identities to team members and determine if the creator still works at your organization.

How Does CloudQuery Insights Connect Cost Data with Resource Activity? #

Insights pulls from multiple sources simultaneously. The AWS CUR integration provides resource-level cost data with trend analysis (30-day and 7-day trends, spike detection). The AWS integration provides configuration data including connection metrics, deployment timestamps, and tag metadata. Insights maps both to the same resource using ARN matching, so you see cost and activity signals together on a single resource view.
Turn cloud chaos into clarity

Find out how CloudQuery can help you get clarity from a chaotic cloud environment with a personalized conversation and demo.