"CloudQuery has freed up time, reduced friction and made teams autonomous". Discover how Reddit is using CloudQuery Register now ❯

CloudQuery

Product News

Introducing the Git Source Plugin: Your Repository Metadata, Finally Queryable

We've all written the script. Clone every repo in the org, grep for Dockerfiles, parse the FROM lines, dump to a spreadsheet, realize you forgot to handle monorepos, start over. It's 2025, and we're still doing this.
The new Git Source Plugin syncs file content and metadata directly into your destination. No local clones, no custom scripts, no spreadsheet purgatory.

Why Querying Repository Files Is Still Painful #

Getting answers to straightforward questions about your repositories is surprisingly painful:
  • Which repos don't have a CODEOWNERS file?
  • Which Dockerfiles still reference node:18 (now EOL)?
  • What Jira team owns this service according to our about.yaml?
  • Which repos have React as a dependency?
Today, answering any of these means cloning hundreds (or thousands) of repos, writing parsing logic, and maintaining that infrastructure. Most teams just… don't bother.

SQL Queries vs Custom Scripts #

TaskManual ApproachGit Source Plugin
Find repos missing LICENSEClone all repos, write bash script, grepSingle SQL query
Track dependency versionsClone, parse package.json, maintain scriptJOIN with version data
Audit Dockerfile base imagesClone, parse FROM lines, handle edge casesSQL with string functions
Keep data currentRe-run script, manage cron jobsIncremental sync
Cross-reference with other dataExport to CSV, manual joinsSQL JOINs across tables

Configuration and Tables #

Configure the plugin with glob patterns for the files you care about:
tables:
  - git_files:
      glob_patterns:
        - '**/Dockerfile*'
        - '**/pom.xml'
        - '.buildkite/**/pipeline.yml'
        - '**/CODEOWNERS'
        - '**/package.json'

Example Queries #

These examples use PostgreSQL as the destination, but the Git Source Plugin works with any CloudQuery destination.
Find repos missing a LICENSE file:
WITH repos_with_license AS (
  SELECT gf.repository_url FROM git_files gf WHERE gf.name = 'LICENSE'
)
SELECT gr.url, gr.owner, gr.full_name FROM git_repositories gr
LEFT OUTER JOIN repos_with_license rl ON rl.repository_url = gr.url
WHERE rl.repository_url IS NULL
Find all Dockerfiles running EOL Node 18:
WITH dockerfiles_parsed AS (
  SELECT repository_url, path, name,
         string_to_table(encode(content, 'escape'), E'\n') AS lines
  FROM git_files
  WHERE name = 'Dockerfile'
),
images AS (
  SELECT repository_url, path, replace(lines, 'FROM ', '') AS image
  FROM dockerfiles_parsed
  WHERE lines LIKE 'FROM %'
)
SELECT * FROM images WHERE image LIKE 'node:18%'
List all repos with React as a dependency:
WITH package_data AS (
  SELECT
    repository_url,
    path,
    convert_from(content, 'UTF8')::jsonb AS package_json
  FROM git_files
  WHERE name = 'package.json' AND content IS NOT NULL
),
all_dependencies AS (
  SELECT
    repository_url,
    package_json->>'name' AS package_name,
    dep_key AS dependency_name,
    package_json->'dependencies'->>dep_key AS dependency_version
  FROM package_data,
  LATERAL jsonb_object_keys(package_json->'dependencies') AS dep_key
  WHERE package_json->'dependencies' IS NOT NULL
)
SELECT * FROM all_dependencies WHERE dependency_name = 'react'

Git Source vs GitHub Source #

The Git Source Plugin and GitHub Source Plugin serve different purposes:
Use CaseGit SourceGitHub Source
File contents (Dockerfiles, package.json, CODEOWNERS)YesNo
Repository metadataBasicComprehensive
Pull requests, issues, reviewsNoYes
Branch protection rulesNoYes
Commit historyYesYes
Works with non-GitHub reposYesNo
Use Git Source when you need to query file contents across repos - configuration files, dependency manifests, or any files matching glob patterns.
Use GitHub Source when you need GitHub-specific data like pull requests merged without review, unprotected branches, issues, or organization settings.
Use both together for complete visibility. Join git_files with github_repositories to correlate file contents with GitHub metadata.

Cross-Plugin Joins #

The Git Source Plugin becomes more powerful when combined with other CloudQuery integrations. Here are three scenarios we keep hearing from teams.

Example: Verify Code Ownership Against Okta #

Every CODEOWNERS file makes a promise: these people are responsible for this code. But when engineers leave, CODEOWNERS files don't update themselves. This query finds ownership gaps by joining Git file content with Okta user status.
WITH codeowners_parsed AS (
  SELECT
    gf.repository_url,
    regexp_matches(
      convert_from(gf.content, 'UTF8'),
      '([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})',
      'g'
    ) AS owner_email_match
  FROM git_files gf
  WHERE gf.name = 'CODEOWNERS'
),
owners_flat AS (
  SELECT
    repository_url,
    owner_email_match[1] AS owner_email
  FROM codeowners_parsed
)
SELECT
  o.repository_url,
  o.owner_email,
  COALESCE(u.status, 'NOT_FOUND') AS okta_status
FROM owners_flat o
LEFT JOIN okta_users u ON lower(o.owner_email) = lower(u.email)
WHERE u.status IS NULL
   OR u.status != 'ACTIVE'
ORDER BY o.repository_url
This surfaces two types of ownership gaps: emails in CODEOWNERS with no Okta account (contractors? typos?) and deactivated users still listed as code owners (offboarding gaps). Neither is something you'd catch without joining these data sources.

Get Started #

The Git Source Plugin is available now on CloudQuery Hub. Download the CloudQuery CLI and start syncing your repository files in minutes.
Related reading:

FAQ #

What file types can the Git Source Plugin sync?
Any file type. The plugin syncs raw file content as binary data, so you can query text files directly or process binary files as needed. Common use cases include configuration files (YAML, JSON, TOML), dependency manifests (package.json, pom.xml, go.mod), Dockerfiles, and documentation (README, CODEOWNERS).
Does the Git Source Plugin clone repositories locally?
No. The plugin fetches file content directly via the Git provider's API. There's no local clone, which means faster syncs and no disk space requirements for repository storage.
How does incremental syncing work?
The plugin tracks file SHA hashes. On subsequent syncs, it only fetches files that have changed since the last sync. This reduces API calls and sync time significantly for large organizations.
Can I use the Git Source Plugin with GitLab or Bitbucket?
The plugin currently supports GitHub. Support for additional Git providers is on the roadmap.
What's the difference between Git Source and GitHub Source plugins?
Git Source syncs file contents from repositories. GitHub Source syncs GitHub-specific metadata like pull requests, issues, branch protection rules, and organization settings. Use Git Source for querying files, GitHub Source for platform data, or both together for complete visibility.
How do I filter which files to sync?
Use glob patterns in the configuration. You can specify patterns like **/Dockerfile*, **/*.json, or .github/**/*.yml to sync only the files you need.

Related posts

Turn cloud chaos into clarity

Find out how CloudQuery can help you get clarity from a chaotic cloud environment with a personalized conversation and demo.


© 2025 CloudQuery, Inc. All rights reserved.