New Event! Join us January 28th in Durham, NC for Cocktails and Cloud Governance Register Now ❯
Product News

Find EOL Docker Images, Missing Licenses, and Stale CODEOWNERS Across Thousands of repos with SQL

You need to find which Dockerfiles still use Node 18 (now EOL). Or which repos are missing a LICENSE file. Or maybe you need to audit CODEOWNERS against your actual employee roster. The typical approach: clone hundreds of repos, write bash scripts to parse files, export to spreadsheets, maintain the script as repos change. It's 2025, and we're still doing this.
The Git Source Plugin syncs file content and metadata directly into your data warehouse. Query any file type across all repos with standard SQL. No local clones, no custom parsing scripts, no maintenance overhead.

Why Querying Repository Files Is Still Painful #

Getting answers to straightforward questions about your repositories is surprisingly painful:
  • Which repos don't have a CODEOWNERS file?
  • Which Dockerfiles still reference node:18 (now EOL)?
  • What Jira team owns this service according to our about.yaml?
  • Which repos have React as a dependency?
Today, answering any of these means cloning hundreds (or thousands) of repos, writing parsing logic, and maintaining that infrastructure. Most teams just… don't bother.

SQL Queries vs Custom Scripts #

TaskManual ApproachGit Source Plugin
Find repos missing LICENSEClone all repos, write bash script, grepSingle SQL query
Track dependency versionsClone, parse package.json, maintain scriptJOIN with version data
Audit Dockerfile base imagesClone, parse FROM lines, handle edge casesSQL with string functions
Keep data currentRe-run script, manage cron jobsIncremental sync
Cross-reference with other dataExport to CSV, manual joinsSQL JOINs across tables

Configuration and Tables #

The plugin syncs three tables:
  • git_repositories - Repository metadata (URL, owner, default branch)
  • git_files - File content and metadata matching your glob patterns
  • git_commits - Commit history (author, committer, message, SHA)
Full configuration example:
kind: source
spec:
  name: git
  path: cloudquery/git
  version: 'v1.0.0'
  tables:
    - git_repositories
    - git_files:
        glob_patterns:
          - '**/Dockerfile*'
          - '**/pom.xml'
          - '.buildkite/**/pipeline.yml'
          - '**/CODEOWNERS'
          - '**/package.json'
          - '**/go.mod'
          - '**/.github/workflows/*.yml'
    - git_commits
  destinations:
    - postgresql
  spec:
    # Repository filtering
    repositories:
      - 'https://github.com/your-org/*'
    # Authentication
    token: '${GIT_TOKEN}'
Configure glob_patterns to sync only the files you need. Common patterns for platform engineers:
PatternUse Case
**/Dockerfile*Container image auditing
**/CODEOWNERSCode ownership tracking
**/package.jsonDependency management
**/.github/workflows/*.ymlCI/CD pipeline analysis
**/go.mod, **/pom.xmlBackend dependency tracking
**/*.tfInfrastructure as Code review

Example Queries #

These examples use PostgreSQL as the destination, but the Git Source Plugin works with any CloudQuery destination.
The queries below demonstrate common use cases for the Git Source Plugin. For detailed, step-by-step tutorials on each scenario, check out our focused posts on querying Dockerfiles for EOL images, finding repos missing LICENSE files, tracking dependencies across your organization, auditing CODEOWNERS against your identity provider, and finding commits by departed employees.
Find repos missing a LICENSE file:
WITH repos_with_license AS (
  SELECT gf.repository_url FROM git_files gf WHERE gf.name = 'LICENSE'
)
SELECT gr.url, gr.owner, gr.full_name FROM git_repositories gr
LEFT OUTER JOIN repos_with_license rl ON rl.repository_url = gr.url
WHERE rl.repository_url IS NULL
Find all Dockerfiles running EOL Node 18:
WITH dockerfiles_parsed AS (
  SELECT repository_url, path, name,
         regexp_split_to_table(convert_from(content, 'UTF8'), E'\n') AS line
  FROM git_files
  WHERE name = 'Dockerfile'
),
images AS (
  SELECT repository_url, path, regexp_replace(line, '^FROM\s+', '') AS image
  FROM dockerfiles_parsed
  WHERE line ~ '^FROM\s+'
)
SELECT * FROM images WHERE image LIKE 'node:18%'
List all repos with React as a dependency:
WITH package_data AS (
  SELECT
    repository_url,
    path,
    convert_from(content, 'UTF8')::jsonb AS package_json
  FROM git_files
  WHERE name = 'package.json' AND content IS NOT NULL
),
all_dependencies AS (
  SELECT
    repository_url,
    package_json->>'name' AS package_name,
    'dependencies' AS dependency_type,
    dep_key AS dependency_name,
    package_json->'dependencies'->>dep_key AS dependency_version
  FROM package_data,
  LATERAL jsonb_object_keys(package_json->'dependencies') AS dep_key
  WHERE package_json->'dependencies' IS NOT NULL

  UNION ALL

  SELECT
    repository_url,
    package_json->>'name' AS package_name,
    'devDependencies' AS dependency_type,
    dep_key AS dependency_name,
    package_json->'devDependencies'->>dep_key AS dependency_version
  FROM package_data,
  LATERAL jsonb_object_keys(package_json->'devDependencies') AS dep_key
  WHERE package_json->'devDependencies' IS NOT NULL
)
SELECT * FROM all_dependencies WHERE dependency_name = 'react'

Git Source vs GitHub Source #

The Git Source Plugin and GitHub Source Plugin serve different purposes:
Use CaseGit SourceGitHub Source
File contents (Dockerfiles, package.json, CODEOWNERS)YesNo
Repository metadataBasicComprehensive
Pull requests, issues, reviewsNoYes
Branch protection rulesNoYes
Commit historyYesYes
Works with non-GitHub reposYesNo
Use Git Source when you need to query file contents across repos - configuration files, dependency manifests, or any files matching glob patterns.
Use GitHub Source when you need GitHub-specific data like pull requests merged without review, unprotected branches, issues, or organization settings.
Use both together for complete visibility. Join git_files with github_repositories to correlate file contents with GitHub metadata.

Cross-Plugin Joins #

The Git Source Plugin becomes more powerful when combined with other CloudQuery integrations. Here are three scenarios we keep hearing from teams.

Example: Verify Code Ownership Against Okta #

Every CODEOWNERS file makes a promise: these people are responsible for this code. But when engineers leave, CODEOWNERS files don't update themselves. This query finds ownership gaps by joining Git file content with identity data from your IdP.
CODEOWNERS files use GitHub handles (@username). To verify ownership against Okta, you need to translate those handles to emails first using the GitHub Source Plugin.
WITH codeowners_parsed AS (
  SELECT
    gf.repository_url,
    regexp_matches(
      convert_from(gf.content, 'UTF8'),
      '(?:^|\s)@([a-zA-Z0-9_-]+(?:/[a-zA-Z0-9_-]+)?)',
      'gm'
    ) AS owner_handle_match
  FROM git_files gf
  WHERE gf.name = 'CODEOWNERS'
),
owners_flat AS (
  SELECT DISTINCT
    repository_url,
    owner_handle_match[1] AS github_handle
  FROM codeowners_parsed
  WHERE owner_handle_match[1] NOT LIKE '%/%'  -- Exclude team handles like org/team
),
owners_with_email AS (
  SELECT
    o.repository_url,
    o.github_handle,
    gm.email AS github_email
  FROM owners_flat o
  LEFT JOIN github_organization_members gm ON lower(o.github_handle) = lower(gm.login)
)
SELECT
  owe.repository_url,
  owe.github_handle,
  owe.github_email,
  COALESCE(u.status, 'NOT_FOUND') AS okta_status
FROM owners_with_email owe
LEFT JOIN okta_users u ON lower(owe.github_email) = lower(u.email)
WHERE owe.github_email IS NULL  -- GitHub user not found or has no public email
   OR u.status IS NULL          -- No matching Okta account
   OR u.status != 'ACTIVE'      -- Okta account inactive
ORDER BY owe.repository_url

Platform Engineering Workflows #

The Git Source Plugin fits into broader platform engineering workflows. Here's how teams are using it:
Compliance and Auditing
Schedule daily syncs and run compliance queries automatically. Export results to your GRC platform or generate reports for SOC 2 audits. The git_commits table helps track code changes by employment status for access reviews.
Incident Response
During security incidents, quickly identify which repos contain affected dependencies or configurations. Join with vulnerability databases to find repos running CVE-affected versions.
Developer Experience
Build internal dashboards showing dependency freshness, missing documentation, or configuration drift across repos. Connect to Grafana or Metabase for visualization.
Automation
Trigger alerts when new repos are created without required files (LICENSE, CODEOWNERS, security policies). Integrate with Slack or PagerDuty for real-time notifications.
# Example: Scheduled sync with GitHub Actions
name: Sync Git Data
on:
  schedule:
    - cron: '0 6 * * *' # Daily at 6am UTC
jobs:
  sync:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Sync git data
        run: cloudquery sync config.yml
        env:
          GIT_TOKEN: ${{ secrets.GIT_TOKEN }}

Get Started #

The Git Source Plugin is available now on CloudQuery Hub. See the tables documentation for the full schema reference. Download the CloudQuery CLI and start syncing your repository files in minutes.
Use Case Tutorials:
Related reading:

FAQ #

What file types can the Git Source Plugin sync?
Any file type. The plugin syncs raw file content as binary data, so you can query text files directly or process binary files as needed. Common use cases include configuration files (YAML, JSON, TOML), dependency manifests (package.json, pom.xml, go.mod), Dockerfiles, and documentation (README, CODEOWNERS).
Does the Git Source Plugin clone repositories locally?
No. The plugin fetches file content directly via the Git provider's API. There's no local clone, which means faster syncs and no disk space requirements for repository storage.
How does incremental syncing work?
The plugin tracks file SHA hashes. On subsequent syncs, it only fetches files that have changed since the last sync. This reduces API calls and sync time significantly for large organizations.
Can I use the Git Source Plugin with GitLab or Bitbucket?
The plugin currently supports GitHub. Support for additional Git providers is on the roadmap.
What's the difference between Git Source and GitHub Source plugins?
Git Source syncs file contents from repositories. GitHub Source syncs GitHub-specific metadata like pull requests, issues, branch protection rules, and organization settings. Use Git Source for querying files, GitHub Source for platform data, or both together for complete visibility.
How do I filter which files to sync?
Use glob patterns in the configuration. You can specify patterns like **/Dockerfile*, **/*.json, or .github/**/*.yml to sync only the files you need.
Turn cloud chaos into clarity

Find out how CloudQuery can help you get clarity from a chaotic cloud environment with a personalized conversation and demo.


© 2026 CloudQuery, Inc. All rights reserved.