Tutorials
Moving Data from PostgreSQL to MotherDuck
MotherDuck is a serverless analytics service powered by DuckDB. If you have data in a PostgreSQL database that you'd like to analyze using MotherDuck, this tutorial walks through how to set that up with CloudQuery. We cover two approaches:
- Copying a PostgreSQL database into MotherDuck as a one-off batch operation, and
- Continuously streaming PostgreSQL data to MotherDuck using PostgreSQL Change Data Capture (CDC).
Which option you choose depends on your use case. Either way, by the end of this tutorial you'll be querying your PostgreSQL data in MotherDuck.
How Do You Create a Test Table in PostgreSQL? (Optional) #
To demonstrate the steps in this tutorial, we'll create a
customers table in PostgreSQL and insert 10,000 generated rows:select
md5(random()::text) as id,
'person' || num || '@' ||
(case (random() * 2)::integer
when 0 then 'gmail'
when 1 then 'hotmail'
when 2 then 'yahoo'
end) || '.com' as email,
now() as created_at
into customers
from generate_series(1,10000) as num;
alter table customers add primary key (id);
And let's test it out:
select * from customers limit 3;
+----------------------------------+---------------------+------------------------------+
| id | email | created_at |
|----------------------------------+---------------------+------------------------------|
| 216016a4c29d66dd007cd1c9fa9994b4 | [email protected] | 2023-06-05 14:04:38.15548+01 |
| de75a87f5f797248fd47ffcb15b7fff1 | [email protected] | 2023-06-05 14:04:38.15548+01 |
| 279838446e9890b00e92221f25bb0c73 | [email protected] | 2023-06-05 14:04:38.15548+01 |
+----------------------------------+---------------------+------------------------------+
SELECT 3
How Do You Install CloudQuery? #
CloudQuery is a cross-platform CLI that can run locally, on a virtual machine, or in a containerized environment. In this tutorial we'll run it locally to copy data from a PostgreSQL database to MotherDuck. On macOS, install it using Homebrew:
brew install cloudquery/tap/cloudquery
See the CloudQuery Quickstart guide for installation instructions on other platforms. The DuckDB integration (used in this tutorial) does not support Windows at this time.
Once installed, verify it from the command line:
$ cloudquery --version
cloudquery version VERSION_CLI
How Do You Configure the Sync? #
The CloudQuery CLI is configured using YAML files. Create a new file called
postgres-to-motherduck.yml with the following content:kind: source
spec:
name: 'postgresql'
path: 'cloudquery/postgresql'
registry: 'cloudquery'
version: 'v6.13.4'
destinations: ['motherduck']
tables: ['customers']
spec:
connection_string: 'postgresql://postgres:pass@localhost:5432/cloudquery?sslmode=disable'
---
kind: destination
spec:
name: 'motherduck'
version: 'v6.3.4'
registry: 'cloudquery'
path: 'cloudquery/duckdb'
write_mode: 'overwrite-delete-stale'
migrate_mode: 'safe'
spec:
connection_string: 'md:'
A typical CloudQuery configuration file has two parts: a source and a destination. The PostgreSQL database is the source here, configured to copy the
customers table from a locally running instance. The destination is the DuckDB integration, and instead of writing to a local file we've pointed it at MotherDuck. You can also use a named database, like md:cloudquery.How Do You Authenticate with MotherDuck? #
To authenticate with MotherDuck, export your service token as an environment variable. You can find the token in the MotherDuck app by clicking your organization name in the top left, then navigating to Settings and creating a new token. See the MotherDuck documentation for the latest instructions on token management.
Copy the token and export it:
export motherduck_token=<INSERT YOUR TOKEN HERE>
You can also add this to your
.bashrc or .zshrc so it loads automatically. If you do, remember to run source ~/.bashrc or source ~/.zshrc before continuing.How Do You Run the Sync? #
The last step is to start the sync. CloudQuery will automatically create tables in MotherDuck and begin loading data from PostgreSQL.
Option 1: Run a One-Off Batch Sync #
In your terminal, run:
cloudquery sync postgres-to-motherduck.yml
This loads all rows from PostgreSQL into MotherDuck and exits once the data has been copied. You should see output similar to the following:
Option 2: Run a Continuous Sync with CDC #
To continuously stream changes from PostgreSQL to MotherDuck, add the
cdc_id option to the source configuration. This enables PostgreSQL logical replication, which requires wal_level = logical on your PostgreSQL server.kind: source
spec:
name: 'postgresql'
path: 'cloudquery/postgresql'
registry: 'cloudquery'
version: 'v6.13.4'
destinations: ['motherduck']
tables: ['customers']
spec:
cdc_id: "postgresql" # <- Unique identifier for this CDC source
connection_string: 'postgresql://postgres:pass@localhost:5432/cloudquery?sslmode=disable'
---
kind: destination
spec:
name: 'motherduck'
path: 'cloudquery/duckdb'
registry: 'cloudquery'
version: 'v6.3.4'
write_mode: 'overwrite-delete-stale'
migrate_mode: 'safe'
spec:
connection_string: 'md:cloudquery'
The
cdc_id value must be a unique string across all your CloudQuery sources. It identifies this particular CDC stream.Now start the sync:
cloudquery sync postgres-to-motherduck-cdc.yml
This time the sync runs continuously, and any changes made to the PostgreSQL database are automatically replicated to MotherDuck.
Taking It to Production #
So far we've been running syncs from a local machine, which is great for testing but not a long-term solution. For production workloads, you'll want to schedule the sync and run it in a managed environment. The CloudQuery Deployment Documentation covers options including Kubernetes, ECS, Cloud Run, GitHub Actions, and more.
For teams that want managed scheduling, monitoring, and collaboration without maintaining their own infrastructure, CloudQuery Platform handles all of this out of the box.
Try CloudQuery Platform
See how CloudQuery makes it straightforward to sync PostgreSQL data to MotherDuck and other destinations. Check out the DuckDB destination docs or browse all available integrations.
Frequently Asked Questions #
What Is the Difference Between a One-Off Sync and CDC? #
A one-off sync copies all data from PostgreSQL to MotherDuck and exits. CDC (Change Data Capture) uses PostgreSQL logical replication to continuously stream changes as they happen, keeping MotherDuck in near-real-time sync with your source database. Use one-off syncs for ad hoc analysis or backfills, and CDC for ongoing analytics on live data.
Do I Need to Change My PostgreSQL Configuration for CDC? #
Yes. CDC requires
wal_level = logical on your PostgreSQL server. If you're running PostgreSQL locally with Docker, pass -c "wal_level=logical" when starting the container. On AWS RDS, set rds.logical_replication = 1 in a custom parameter group. On GCP Cloud SQL, set cloudsql.logical_decoding = on.Does the DuckDB Integration Support Windows? #
Not currently. The DuckDB destination plugin does not support Windows due to dynamic linking requirements in the DuckDB Go library. It works on macOS and Linux.
What Write Modes Does the DuckDB Destination Support? #
The DuckDB destination supports three write modes:
overwrite-delete-stale (default, replaces existing rows and removes stale ones), overwrite (replaces existing rows without deleting stale ones), and append (adds new rows without modifying existing data).Can I Sync Multiple Tables at Once? #
Yes. The
tables field in the source configuration accepts a list. You can specify individual table names or use wildcards like ['public.*'] to sync all tables in a schema.How Do I Monitor a Running CDC Sync? #
The CloudQuery CLI outputs log messages as it processes changes. For production deployments, CloudQuery Platform provides a dashboard with sync status, error tracking, and alerting built in.


