Product News

Security

Investigating Toxic IAM and Access Combinations in AWS

Q: How Often Should IAM Access Keys Be Rotated?

[AWS IAM best practices](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html) and the CIS AWS Foundations Benchmark both recommend rotating access keys every 90 days or less. In practice, many organizations find keys that are 6, 12, or even 18+ months old - particularly for service accounts and automated processes that were set up and forgotten.

Q: How Does CloudQuery Detect Security Group Misconfigurations?

CloudQuery syncs security group configuration through the [AWS integration](/hub/plugins/source/cloudquery/aws), including all inbound and outbound rules. You can query the `aws_ec2_security_groups` table directly with SQL to find groups that allow ingress from 0.0.0.0/0 on specific ports. Insights also surfaces these findings automatically when AWS Security Hub flags them.

Q: What Is the CIS Benchmark Recommendation for SSH Access?

[CIS control EC2.13](https://docs.aws.amazon.com/securityhub/latest/userguide/ec2-controls.html) states that security groups should not allow ingress from 0.0.0.0/0 or ::/0 to port 22. The rationale is that unrestricted SSH access removes a layer of defense and exposes instances to brute-force attacks, credential stuffing, and lateral movement from compromised networks.

Q: How Does Terraform State Integration Help Identify Unowned Resources?

The [CloudQuery Terraform integration](/hub/plugins/source/cloudquery/terraform) syncs your Terraform state into queryable tables. By joining Terraform state data with AWS resource data, you can identify resources that exist in your cloud accounts but aren't managed by any Terraform configuration. These unmanaged resources often lack ownership, don't receive regular patching or review, and represent governance blind spots.

Joe Karlsson

•

12 min read

TL;DR

Security tools flag individual findings: a stale access key here, an open security group there. The real risk is when multiple signals converge on the same resource - stale credentials with broad permissions and suspicious activity, or wide-open SSH across instances with no session management. This post walks through three investigation playbooks for finding these toxic combinations using AWS CLI, SQL queries, and CloudQuery Insights.

What is a toxic combination? A toxic combination occurs when multiple individually moderate-risk signals converge on the same cloud resource or identity, creating compound risk that no single tool surfaces. A stale key is a finding. A stale key with admin access and anomalous activity is an incident waiting to happen.

Your CSPM told you about the stale IAM access key. What it didn't tell you is that the same identity has s3:* permissions, and CloudTrail is logging API calls from three different countries this week.

That's the gap. Not a lack of findings - a lack of connection between them. AWS Security Hub flags the key age. Your CSPM scores the permission breadth. CloudTrail records the geographic anomaly. Three tools, three findings, three consoles. Nobody connects them to the same identity.

The CIS AWS Foundations Benchmark defines the individual controls: IAM.3 says rotate access keys every 90 days (AWS IAM best practices recommends the same). EC2.13 says no security groups should allow ingress from 0.0.0.0/0 to port 22. These are good controls. But a stale key with read-only permissions on a test account is a different animal than a stale key with admin access showing anomalous activity patterns. Individual controls don't capture that distinction.

The 2025 Verizon DBIR found that stolen credentials were the initial access vector in 22% of breaches, and 88% of basic web application attacks involved stolen credentials. The credentials themselves aren't hard to find in your environment - the hard part is knowing which ones are actually dangerous right now.

Playbook	Signals Combined	CIS Control	Individual Risk	Combined Risk
Stale credentials	Key age (380d) + `s3:*` permissions + multi-country API calls	IAM.3	Medium	Critical
Open SSH	0.0.0.0/0 on port 22 + 14 instances + no SSM Agent	EC2.13	Medium	High
Unowned compute	Vulnerable AMI + admin IAM role + no Terraform state	Multiple	Medium each	Critical

Key Takeaways #

Stolen credentials were the initial access vector in 22% of breaches in 2025, and most go undetected because the signals are spread across multiple tools
CIS AWS Foundations Benchmark controls IAM.3 (key rotation) and EC2.13 (SSH access) define the individual rules, but the real risk is when violations compound on the same resource
CloudQuery syncs AWS data by calling APIs like ListAccessKeys, DescribeSecurityGroups, and LookupEvents across all configured accounts, normalizing the data into SQL tables you can join and query
Insights automates the correlation - mapping Security Hub findings, Wiz alerts, and custom Policy violations to individual resources so you see the compound picture without running manual queries

Playbook 1: How Do You Investigate Stale Credentials with Broad Permissions? #

The scenario: An IAM user whose access keys haven't been rotated in 380 days, attached to a policy allowing s3:*, with CloudTrail showing API calls from the US, Germany, and Singapore in the same week.

Each signal alone is a medium-severity finding at best. Together, they point at a potentially compromised identity with the keys to your data.

The Manual Investigation #

Start with the key age. List the access keys for the user to check when each was created:

aws iam list-access-keys --user-name suspect-user

The CreateDate field tells you when the key was issued. If it's older than 90 days, it violates CIS control IAM.3. If it's older than a year, the probability of exposure goes up significantly - that's a key that has survived multiple laptop rotations, offboarding cycles, and credential audits without being cycled.

Next, check the permission scope:

aws iam list-attached-user-policies --user-name suspect-user
aws iam get-policy-version \
  --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess \
  --version-id v1

Then check CloudTrail for geographic distribution. The CloudTrail event record includes a sourceIPAddress field on every API call:

aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=Username,AttributeValue=suspect-user \
  --start-time 2026-03-26 --end-time 2026-04-02

You're looking for sourceIPAddress values that don't match your known office, VPN, or CI/CD IP ranges. Three countries in a week is worth investigating.

The SQL Approach #

With CloudQuery syncing your AWS data, you can join these signals in a single query against your aws_iam_users and aws_iam_user_access_keys tables:

SELECT
    u.user_name,
    u.arn,
    ak.access_key_id,
    ak.last_rotated,
    dateDiff('day', ak.last_rotated, now()) AS key_age_days
FROM aws_iam_users u
JOIN aws_iam_user_access_keys ak ON u.arn = ak.user_arn
WHERE ak.status = 'Active'
  AND dateDiff('day', ak.last_rotated, now()) > 90
ORDER BY key_age_days DESC;

This gives you every active key older than 90 days, sorted by age. From here you can cross-reference with attached policies to find which of those stale keys also have broad permissions.

What Insights Adds #

CloudQuery Insights runs this correlation automatically. When you connect the AWS integration, Insights surfaces findings from AWS Security Hub and maps them to individual resources in your asset inventory. The stale key finding, the permission scope, and any Security Hub alerts for the same identity appear together on a single resource view.

The Evidence panel shows what was detected. The Mitigation panel shows remediation steps. And because Insights evaluates after every sync, the findings stay current without anyone having to run these queries manually.

A note on geographic anomalies: API calls from multiple countries aren't always malicious. Remote engineering teams, CI/CD pipelines running in multiple regions, and VPN exit nodes all produce legitimate multi-country patterns. I've seen teams waste hours investigating geographic "anomalies" that turned out to be a Terraform Cloud runner in eu-west-1. The value of surfacing this signal isn't auto-remediation - it's making sure someone actually looks at it in context. A 380-day-old key with admin S3 access and multi-country API calls deserves a conversation. A 30-day-old key used by a CI pipeline in us-east-1 and eu-west-1 probably doesn't.

Playbook 2: How Do You Find Wide-Open SSH with No Session Management? #

The scenario: A security group allowing inbound traffic on port 22 from 0.0.0.0/0, attached to 14 instances across two accounts, none of which have SSM Agent installed or are covered by your organization's SSH bastion policy.

This is CIS control EC2.13 in its most common form. The control is straightforward - no unrestricted ingress to port 22 - but the real question is how many instances are affected and whether any alternative session management is in place.

The Manual Investigation #

Find security groups with open SSH:

aws ec2 describe-security-groups \
  --filters "Name=ip-permission.from-port,Values=22" \
             "Name=ip-permission.to-port,Values=22" \
             "Name=ip-permission.cidr,Values=0.0.0.0/0" \
  --query "SecurityGroups[].{ID:GroupId,Name:GroupName}" \
  --output table

Then check which instances use those security groups and whether they have SSM Agent installed:

aws ssm describe-instance-information \
  --query "InstanceInformationList[].{ID:InstanceId,PingStatus:PingStatus}" \
  --output table

Cross-reference the two lists. Instances attached to the open security group that don't appear in the SSM inventory have wide-open SSH with no centralized session management and no audit trail for who connects.

Here's the part that makes this painful at scale: if those 14 instances are spread across two AWS accounts, you're running these commands twice with different credentials, exporting the results, and merging them in a spreadsheet. If you have 20 accounts, it's 20 times. I've done this manually and it's the kind of work that takes an entire afternoon and produces a snapshot that's stale by the time you're done writing the report.

The SQL Approach #

When you connect the AWS integration with multi-account mode, CloudQuery uses AWS Organizations role assumption to automatically discover and sync all member accounts. Every resource lands in the same set of tables with an account_id column, so a single query covers your entire organization.

With CloudQuery, you can join aws_ec2_security_groups with instance data to find affected resources across all accounts in one query:

SELECT
    i.instance_id,
    i.region,
    i.account_id,
    sg.group_id,
    sg.group_name
FROM aws_ec2_instances i,
    arrayJoin(JSONExtractArrayRaw(assumeNotNull(i.security_groups))) AS security_group
JOIN aws_ec2_security_groups sg
    ON JSONExtractString(security_group, 'GroupId') = sg.group_id
WHERE sg.ip_permissions LIKE '%0.0.0.0/0%'
  AND sg.ip_permissions LIKE '%"FromPort":22%';

Add a LEFT JOIN against SSM instance data to flag which of those instances lack session management. Now you have a prioritized list: open SSH, no SSM, no audit trail.

What Insights Adds #

Insights correlates the security group finding from AWS Security Hub with the SSM coverage gap on each affected instance. Instead of running two CLI commands and cross-referencing in a spreadsheet, you see both signals on the same resource in the Insights detail view. The Related Resources tab shows which other instances share the same security group, so you can scope the blast radius.

Edge case worth noting: Some legacy workloads genuinely need direct SSH access - older AMIs without SSM Agent support, air-gapped environments, or instances running custom kernels. I've worked with teams that had legitimate reasons for every one of those 14 open-SSH instances, but they couldn't prove it until someone asked. The point isn't that every instance must use SSM. It's knowing which instances have open SSH and making a deliberate decision about each one, rather than discovering them during an incident.

Playbook 3: How Do You Identify Unpatched, Overprivileged, and Unowned Resources? #

The scenario: An EC2 instance running a known-vulnerable AMI, with an IAM role that has admin access, no associated Terraform state, and $1,400/month in compute costs (roughly an r6i.8xlarge running 24/7). Nobody owns it, nobody patched it, and it has the keys to the kingdom.

This is the scenario where three medium-severity findings become one critical problem. A vulnerable AMI alone is a patching ticket. An admin IAM role alone is a permissions review. No Terraform state alone is a governance gap. But all three on the same instance? That's a resource that can be compromised, has the permissions to do real damage, and has no owner who would notice.

The Manual Investigation #

Check the AMI and its age:

aws ec2 describe-instances --instance-ids i-0abc123def456 \
  --query "Reservations[].Instances[].{AMI:ImageId,LaunchTime:LaunchTime,Role:IamInstanceProfile.Arn}"

Then check what the attached IAM role can do:

aws iam get-role --role-name the-role-name
aws iam list-attached-role-policies --role-name the-role-name

If any attached policy grants "Action": "*" on "Resource": "*", that instance has admin access to your entire AWS account.

For IaC coverage, check whether the instance ID appears in your Terraform state:

terraform state list | grep i-0abc123def456

If it's not there, nobody provisioned it through your standard workflow. Check CloudTrail for the RunInstances event to find who launched it and when.

The SQL Approach #

With the CloudQuery Terraform integration, you can query IaC coverage alongside your AWS data:

SELECT
    i.instance_id,
    i.image_id,
    i.iam_instance_profile,
    i.account_id,
    i.region
FROM aws_ec2_instances i
LEFT JOIN tf_resources tf
    ON i.instance_id = tf.id
WHERE tf.id IS NULL
  AND i.iam_instance_profile IS NOT NULL;

This returns every EC2 instance that has an IAM role but doesn't appear in Terraform state - unowned instances with permissions. Add cost data from the AWS CUR integration to prioritize by spend.

What Insights Adds #

Insights correlates the vulnerable AMI finding (from AWS Security Hub), the overprivileged role, and the IaC coverage gap on a single resource view. The absence of Terraform state is itself a signal - Insights treats it as a finding worth surfacing, not missing data to ignore.

The key insight here (no pun intended) is that "not in Terraform" is often a proxy for "nobody owns this." And an unowned resource with admin permissions doesn't get patched, doesn't get reviewed in access audits, and doesn't show up in anyone's quarterly security review. It accumulates risk silently until something breaks.

How Do You Write Custom Correlation Rules? #

The built-in Insight sources cover common patterns, but your environment has its own definition of "toxic." CloudQuery Policies let you define custom rules in SQL that generate Insights findings.

A basic example: flag any IAM user with active access keys older than 90 days where the attached policy grants full access to any service:

SELECT
    u.arn,
    u.user_name,
    ak.access_key_id,
    dateDiff('day', ak.last_rotated, now()) AS key_age_days
FROM aws_iam_users u
JOIN aws_iam_user_access_keys ak ON u.arn = ak.user_arn
WHERE ak.status = 'Active'
  AND dateDiff('day', ak.last_rotated, now()) > 90
  AND EXISTS (
    SELECT 1 FROM aws_iam_user_attached_policies p
    WHERE p.user_arn = u.arn
      AND p.policy_name LIKE '%FullAccess%'
  );

Save this as a Policy, and it generates findings that appear in the Insights dashboard alongside your Security Hub and Wiz findings. You can also connect additional security signals from CrowdStrike for endpoint-level visibility.

The advantage of SQL-based rules is that platform engineers already know the language. No Rego, no proprietary DSL - the same SQL you use for ad hoc queries becomes a persistent detective control that runs after every sync.

See What Toxic Combinations Exist in Your Infrastructure

Book a Demo

Frequently Asked Questions #

What Are Toxic IAM Combinations in AWS? #

A toxic IAM combination is when multiple individually moderate-risk signals converge on the same identity or resource to create a high-risk situation. For example, an IAM user with stale access keys (violating CIS control IAM.3), overly broad permissions like s3:*, and anomalous API activity patterns. No single signal triggers an urgent response, but the combination demands immediate investigation.

How Often Should IAM Access Keys Be Rotated? #

AWS IAM best practices and the CIS AWS Foundations Benchmark both recommend rotating access keys every 90 days or less. In practice, many organizations find keys that are 6, 12, or even 18+ months old - particularly for service accounts and automated processes that were set up and forgotten.

How Does CloudQuery Detect Security Group Misconfigurations? #

CloudQuery syncs security group configuration through the AWS integration, including all inbound and outbound rules. You can query the aws_ec2_security_groups table directly with SQL to find groups that allow ingress from 0.0.0.0/0 on specific ports. Insights also surfaces these findings automatically when AWS Security Hub flags them.

What Is the CIS Benchmark Recommendation for SSH Access? #

CIS control EC2.13 states that security groups should not allow ingress from 0.0.0.0/0 or ::/0 to port 22. The rationale is that unrestricted SSH access removes a layer of defense and exposes instances to brute-force attacks, credential stuffing, and lateral movement from compromised networks.

Can CloudQuery Correlate Findings from Third-Party Security Tools? #

Yes. CloudQuery supports integrations with security tools like Wiz and CrowdStrike. Findings from these tools are mapped to resources in your asset inventory using ARN or resource ID matching. Insights surfaces these third-party findings alongside native AWS findings, giving you a combined view per resource.

How Does Terraform State Integration Help Identify Unowned Resources? #

The CloudQuery Terraform integration syncs your Terraform state into queryable tables. By joining Terraform state data with AWS resource data, you can identify resources that exist in your cloud accounts but aren't managed by any Terraform configuration. These unmanaged resources often lack ownership, don't receive regular patching or review, and represent governance blind spots.

What Data Sources Does CloudQuery Insights Use for Security Findings? #

Insights pulls from multiple built-in sources: AWS Security Hub, AWS Health, GCP Security Center, and Azure Advisor activate automatically when you connect the corresponding cloud integration. Third-party sources like Wiz provide additional security and data findings. You can also create custom Insight sources using Policies with SQL rules tailored to your organization's specific risk criteria. Insights correlates them automatically alongside cost data, ownership metadata, and findings from your other security tools.

CloudOps

Investigating Toxic IAM and Access Combinations in AWS

Key Takeaways #

Playbook 1: How Do You Investigate Stale Credentials with Broad Permissions? #

The Manual Investigation #

The SQL Approach #

What Insights Adds #

Playbook 2: How Do You Find Wide-Open SSH with No Session Management? #

The Manual Investigation #

The SQL Approach #

What Insights Adds #

Playbook 3: How Do You Identify Unpatched, Overprivileged, and Unowned Resources? #

The Manual Investigation #

The SQL Approach #

What Insights Adds #

How Do You Write Custom Correlation Rules? #

Frequently Asked Questions #

What Are Toxic IAM Combinations in AWS? #

How Often Should IAM Access Keys Be Rotated? #

How Does CloudQuery Detect Security Group Misconfigurations? #

What Is the CIS Benchmark Recommendation for SSH Access? #

Can CloudQuery Correlate Findings from Third-Party Security Tools? #

How Does Terraform State Integration Help Identify Unowned Resources? #

What Data Sources Does CloudQuery Insights Use for Security Findings? #

How to Find GitHub Repos Vulnerable to Supply Chain Attacks Like CanisterWorm

How to Find GitHub Repos Vulnerable to Supply Chain Attacks Like CanisterWorm