Product News
Introducing the Git Source Plugin: Your Repository Metadata, Finally Queryable
We've all written the script. Clone every repo in the org, grep for Dockerfiles, parse the FROM lines, dump to a spreadsheet, realize you forgot to handle monorepos, start over. It's 2025, and we're still doing this.
The new Git Source Plugin syncs file content and metadata directly into your destination. No local clones, no custom scripts, no spreadsheet purgatory.
Why Querying Repository Files Is Still Painful #
Getting answers to straightforward questions about your repositories is surprisingly painful:
- Which repos don't have a CODEOWNERS file?
- Which Dockerfiles still reference
node:18(now EOL)? - What Jira team owns this service according to our
about.yaml? - Which repos have React as a dependency?
Today, answering any of these means cloning hundreds (or thousands) of repos, writing parsing logic, and maintaining that infrastructure. Most teams just… don't bother.
SQL Queries vs Custom Scripts #
Configuration and Tables #
Configure the plugin with glob patterns for the files you care about:
tables:
- git_files:
glob_patterns:
- '**/Dockerfile*'
- '**/pom.xml'
- '.buildkite/**/pipeline.yml'
- '**/CODEOWNERS'
- '**/package.json'
Example Queries #
These examples use PostgreSQL as the destination, but the Git Source Plugin works with any CloudQuery destination.
Find repos missing a LICENSE file:
WITH repos_with_license AS (
SELECT gf.repository_url FROM git_files gf WHERE gf.name = 'LICENSE'
)
SELECT gr.url, gr.owner, gr.full_name FROM git_repositories gr
LEFT OUTER JOIN repos_with_license rl ON rl.repository_url = gr.url
WHERE rl.repository_url IS NULL
Find all Dockerfiles running EOL Node 18:
WITH dockerfiles_parsed AS (
SELECT repository_url, path, name,
string_to_table(encode(content, 'escape'), E'\n') AS lines
FROM git_files
WHERE name = 'Dockerfile'
),
images AS (
SELECT repository_url, path, replace(lines, 'FROM ', '') AS image
FROM dockerfiles_parsed
WHERE lines LIKE 'FROM %'
)
SELECT * FROM images WHERE image LIKE 'node:18%'
List all repos with React as a dependency:
WITH package_data AS (
SELECT
repository_url,
path,
convert_from(content, 'UTF8')::jsonb AS package_json
FROM git_files
WHERE name = 'package.json' AND content IS NOT NULL
),
all_dependencies AS (
SELECT
repository_url,
package_json->>'name' AS package_name,
dep_key AS dependency_name,
package_json->'dependencies'->>dep_key AS dependency_version
FROM package_data,
LATERAL jsonb_object_keys(package_json->'dependencies') AS dep_key
WHERE package_json->'dependencies' IS NOT NULL
)
SELECT * FROM all_dependencies WHERE dependency_name = 'react'
Git Source vs GitHub Source #
The Git Source Plugin and GitHub Source Plugin serve different purposes:
Use Git Source when you need to query file contents across repos - configuration files, dependency manifests, or any files matching glob patterns.
Use GitHub Source when you need GitHub-specific data like pull requests merged without review, unprotected branches, issues, or organization settings.
Use both together for complete visibility. Join
git_files with github_repositories to correlate file contents with GitHub metadata.Cross-Plugin Joins #
The Git Source Plugin becomes more powerful when combined with other CloudQuery integrations. Here are three scenarios we keep hearing from teams.
Example: Verify Code Ownership Against Okta #
Every CODEOWNERS file makes a promise: these people are responsible for this code. But when engineers leave, CODEOWNERS files don't update themselves. This query finds ownership gaps by joining Git file content with Okta user status.
WITH codeowners_parsed AS (
SELECT
gf.repository_url,
regexp_matches(
convert_from(gf.content, 'UTF8'),
'([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})',
'g'
) AS owner_email_match
FROM git_files gf
WHERE gf.name = 'CODEOWNERS'
),
owners_flat AS (
SELECT
repository_url,
owner_email_match[1] AS owner_email
FROM codeowners_parsed
)
SELECT
o.repository_url,
o.owner_email,
COALESCE(u.status, 'NOT_FOUND') AS okta_status
FROM owners_flat o
LEFT JOIN okta_users u ON lower(o.owner_email) = lower(u.email)
WHERE u.status IS NULL
OR u.status != 'ACTIVE'
ORDER BY o.repository_url
This surfaces two types of ownership gaps: emails in CODEOWNERS with no Okta account (contractors? typos?) and deactivated users still listed as code owners (offboarding gaps). Neither is something you'd catch without joining these data sources.
Get Started #
The Git Source Plugin is available now on CloudQuery Hub. Download the CloudQuery CLI and start syncing your repository files in minutes.
Related reading:
FAQ #
What file types can the Git Source Plugin sync?
Any file type. The plugin syncs raw file content as binary data, so you can query text files directly or process binary files as needed. Common use cases include configuration files (YAML, JSON, TOML), dependency manifests (package.json, pom.xml, go.mod), Dockerfiles, and documentation (README, CODEOWNERS).
Does the Git Source Plugin clone repositories locally?
No. The plugin fetches file content directly via the Git provider's API. There's no local clone, which means faster syncs and no disk space requirements for repository storage.
How does incremental syncing work?
The plugin tracks file SHA hashes. On subsequent syncs, it only fetches files that have changed since the last sync. This reduces API calls and sync time significantly for large organizations.
Can I use the Git Source Plugin with GitLab or Bitbucket?
The plugin currently supports GitHub. Support for additional Git providers is on the roadmap.
What's the difference between Git Source and GitHub Source plugins?
Git Source syncs file contents from repositories. GitHub Source syncs GitHub-specific metadata like pull requests, issues, branch protection rules, and organization settings. Use Git Source for querying files, GitHub Source for platform data, or both together for complete visibility.
How do I filter which files to sync?
Use glob patterns in the configuration. You can specify patterns like
**/Dockerfile*, **/*.json, or .github/**/*.yml to sync only the files you need.