Back to plugin list
Official
Premium
PostgreSQL
The CloudQuery PostgreSQL plugin syncs your PostgreSQL database to any of the supported CloudQuery destinations
Publisher
cloudquery
Latest version
v5.2.0
Type
Source
Platforms
Date Published
Mar 12, 2024
Price per 1M rows
$10
monthly free quota
10M rows
Set up process
Overview
The CloudQuery PostgreSQL plugin syncs your PostgreSQL database to any of the supported CloudQuery destinations (e.g. PostgreSQL, BigQuery, Snowflake, and more).
This plugin also supports CDC via PostgreSQL logical replication, which enables keeping your PostgreSQL up to date with any destination by subscribing to changes.
Authentication
In order to fetch information from PostgreSQL,
cloudquery
needs to be authenticated. Credentials can be inserted in the connection_string
field.Configuration
This example configures a PostgreSQL source, located at
localhost:5432
. The (top level) spec section is described in the Source Spec Reference.kind: source
spec:
name: "postgresql"
registry: "cloudquery"
path: "cloudquery/postgresql"
version: "v5.2.0"
tables: ["*"]
destinations: ["postgresql"]
spec:
connection_string: "${POSTGRESQL_CONNECTION_STRING}" # set the environment variable in a format like postgres://postgres:pass@localhost:5432/postgres?sslmode=disable
# Optional parameters:
# cdc_id: "postgresql" # Set to a unique string per source to enable Change Data Capture mode (logical replication, or CDC)
# pgx_log_level: error
# rows_per_record: 1
PostgreSQL Spec
This is the (nested) spec used by the PostgreSQL source Plugin.
connection_string
(string
) (required)Connection string to connect to the database. This can be a URL or a DSN, as perpgxpool
"postgres://jack:secret@localhost:5432/mydb?sslmode=prefer"
connect with tcp and prefer TLS"postgres://jack:secret@localhost:5432/mydb?sslmode=disable&application_name=pgxtest&search_path=myschema&connect_timeout=5"
be explicit with all options"postgres://localhost:5432/mydb?sslmode=disable"
connect with os username cloudquery is being run as"postgres:///mydb?host=/tmp"
connect over unix socket"dbname=mydb"
unix domain socket, just specifying the db name - useful if you want to use peer authentication"user=jack password=jack\\'ssooper\\\\secret host=localhost port=5432 dbname=mydb sslmode=disable"
DSN with escaped backslash and single quote
pgx_log_level
(string
) (optional) (default:error
)cdc_id
(string
) (optional)If set to a non-empty string the source plugin will start syncing CDC via PostgreSQL logical replication in real-time. The value should be unique across all sources.rows_per_record
(integer
) (optional) (default:1
)Amount of rows to be packed into a single Apache Arrow record to be sent over the wire during sync (or initial sync in the CDC mode). We suggest using significantly more than the default (e.g.5000
) to sync from large databases/tables.
CDC
PostgreSQL CDC (Logical Replication) Configuration
Overview
Change Data Capture (CDC) is a set of software design patterns used to determine and track the data that has changed so that action can be taken using the changed data.
In PostgreSQL CDC can be implemented using Logical Replication.
Logical Replication is a feature of PostgreSQL that allows you to stream changes from a database to another database, file or custom handler. It is also used to keep a copy of a database up to date with the original database in an efficient manner (Internally it is implemented by the so called Write-Ahead Log).
In this document we won't go into details of how Logical Replication works internally, but we will show you how to enable it in number of environments and how to configure CloudQuery PostgreSQL source plugin that can stream the changes to any of CloudQuery supported destinations.
The PostgreSQL source plugin streams changes directly to any CloudQuery destination without the need for additional infrastructure (e.g. Kafka, RabbitMQ, etc). This means the setup is much easier.
PSQL Test
To test that the current PostgreSQL instance supports Logical Replication we can run the following command in PSQL:
SHOW wal_level;
or the following SQL query:
SELECT setting FROM pg_settings WHERE name='wal_level'
The default is
replication
but for CDC to work we need to set it to logical
(This can only be done on the database startup).Docker Setup
If you are running it locally for testing purposes you can use the following command to docker to enable logical replication:
docker run -p 5432:5432 -e POSTGRES_PASSWORD=pass -d postgres:11 -c "wal_level=logical"
Self-hosted
Change the
wal_level
to logical
in the postgresql.conf
file and restart the database.RDS
You will need to create a custom RDS parameter group, associate it with your RDS instance, set
rds.logical_replication
to 1
and restart the database.See full resolution on AWS documentation
AWS Aurora serverless (V2)
AWS Aurora serverless V1 doesn't support Logical Replication, but V2 does.
Similar to RDS, you will need to create a custom parameter group, associate it with your Aurora serverless instance, set
rds.logical_replication
to 1
and restart the database.See full resolution on AWS documentation
GCP Cloud SQL (PostgreSQL)
You will need to set the
cloudsql.logical_deconging
configuration to on. See full documentation on GCP documentationAzure DB (PostgreSQL)
Please follow the Official Azure documentation on how to enable logical replication via the CLI or Console (UI).
Current Limitations
The following event types are not supported:
- DDL changes (e.g.
CREATE TABLE
,ALTER TABLE
, etc) TRUNCATE
statements (e.g.TRUNCATE TABLE table_name
)DELETE
statements (e.g.DELETE FROM table_name
)