CloudQuery Policies: Create cloud controls with AI for all your resources — every cloud, every account, every IaC or console.

Read the announcement ❯

Read the announcement ❯

Tutorials

Moving Data from Postgres to MotherDuck

Herman Schaaf

Herman Schaaf

5 min read

MotherDuck is a serverless analytics service powered by DuckDB. If you have data in a PostgreSQL database that you'd like to analyze using MotherDuck, this tutorial will show you how to do this using CloudQuery, a high-performance data movement platform. We will cover:
  1. How to copy a PostgreSQL database into MotherDuck as a one-off batch operation, and
  2. How to continuously stream PostgreSQL data from PostgreSQL to MotherDuck using PostgreSQL Change Data Capture (CDC).
Which option you choose is up to you. Either way, by the end of this tutorial you will be able to analyze data from your PostgreSQL database using MotherDuck.
Moving Data from Postgres to MotherDuck YouTube Video

Step 0. Create a Test Table in PostgreSQL (Optional) #

To demonstrate the steps in this tutorial, we will create a basic customers table in PostgreSQL and insert 10,000 generated rows:
select
  md5(random()::text) as id,
  'person' || num || '@' ||
  (case (random() * 2)::integer
    when 0 then 'gmail'
    when 1 then 'hotmail'
    when 2 then 'yahoo'
  end) || '.com' as email,
  now() as created_at
into customers
from generate_series(1,10000) as num;
alter table customers add primary key (id);
And let's test it out:
select * from customers limit 3;
+----------------------------------+---------------------+------------------------------+
| id                               | email               | created_at                   |
|----------------------------------+---------------------+------------------------------|
| 216016a4c29d66dd007cd1c9fa9994b4 | [email protected] | 2023-06-05 14:04:38.15548+01 |
| de75a87f5f797248fd47ffcb15b7fff1 | [email protected]   | 2023-06-05 14:04:38.15548+01 |
| 279838446e9890b00e92221f25bb0c73 | [email protected]   | 2023-06-05 14:04:38.15548+01 |
+----------------------------------+---------------------+------------------------------+
SELECT 3

Step 1. Install CloudQuery #

CloudQuery is a cross-platform command-line tool for extracting and loading data that can be run just about anywhere: locally, on a virtual machine, or in a containerized environment. In this tutorial we'll try it out locally, and we'll use it to copy data from a local PostgreSQL database to MotherDuck. On MacOS, you can install it using Homebrew:
brew install cloudquery/tap/cloudquery
See the CloudQuery Quickstart guide for installation instructions on other platforms. Please note that the DuckDB integration (that we will use later) does not support Windows at this time.
Once CloudQuery is installed, you should be able to invoke it from the command line:
$ cloudquery --version
cloudquery version VERSION_CLI

Step 2. Create the Configuration File #

The CloudQuery CLI is configured using YAML files. Let's create a new file called postgres-to-motherduck.yml and add the following content:
kind: source
spec:
  name: 'postgresql'
  path: 'cloudquery/postgresql'
  registry: 'cloudquery'
  version: 'v6.13.3'
  destinations: ['motherduck']
  tables: ['customers']
  spec:
    connection_string: 'postgresql://postgres:pass@localhost:5432/cloudquery?sslmode=disable'
---
kind: destination
spec:
  name: 'motherduck'
  version: 'v6.3.3'
  registry: 'cloudquery'
  path: 'cloudquery/duckdb'
  write_mode: 'overwrite-delete-stale'
  migrate_mode: 'safe'
  spec:
    connection_string: 'md:'
A typical CloudQuery configuration consists of two parts: a source and a destination. A PostgreSQL database is the source in this case, and we've configured it to copy the customers table from a locally running PostgreSQL database. The destination is the DuckDB integration, and instead of a local file we've configured it to write to the main MotherDuck database. You can also use an alias for the MotherDuck database, such as md:cloudquery.

Step 3. Set Up Authentication #

To authenticate with MotherDuck, we need to export our service token as an environment variable. As documented in the MotherDuck documentation, you can find the token by navigating to https://app.motherduck.com/ and clicking the cog in the top right-hand corner:
Copy the token to your clipboard and then export it as an environment variable called motherduck_token:
export motherduck_token=<INSERT YOUR TOKEN HERE>
You may also choose to place the token in your .bashrc or .zshrc file so that it is automatically loaded when you open a new terminal window. If you do this, remember to run source ~/.bashrc or source ~/.zshrc to reload your shell before continuing.

Step 4. Run a Sync #

The last step is to start the sync. This will automatically create new tables in MotherDuck and start loading data from PostgreSQL to MotherDuck.

Option 1: Run a One-Off Sync #

In your terminal, run:
cloudquery sync postgres-to-motherduck.yml
This will load the rows from PostgreSQL into MotherDuck, and exit once all the data has been copied. You should see output similar to the following:
Video walkthrough of running a one-off CloudQuery sync from PostgreSQL to MotherDuck

Option 2: Run a Continuous Sync with Change Data Capture (CDC) #

To continuously load changes from PostgreSQL to MotherDuck in a streaming fashion, we can add the cdc: true option to the source configuration:
kind: source
spec:
  name: 'postgresql'
  path: 'cloudquery/postgresql'
  registry: 'cloudquery'
  version: 'v6.13.3'
  destinations: ['motherduck']
  tables: ['customers']
  spec:
    cdc: true # <- This enables Change Data Capture mode
    connection_string: 'postgresql://postgres:pass@localhost:5432/cloudquery?sslmode=disable'
---
kind: destination
spec:
  name: 'motherduck'
  path: 'cloudquery/duckdb'
  registry: 'cloudquery'
  version: 'v6.3.3'
  write_mode: 'overwrite-delete-stale'
  migrate_mode: 'safe'
  spec:
    connection_string: 'md:cloudquery'
Now, back in the terminal, let's start a sync again:
cloudquery sync postgres-to-motherduck-cdc.yml
This time the sync will run continuously, and any changes made to the PostgreSQL database will be automatically copied to MotherDuck.
Video walkthrough of setting up continuous sync with Change Data Capture from PostgreSQL to MotherDuck

Next Steps #

In this post we covered how to sync both once-off and continuously from PostgreSQL to MotherDuck from your local machine, using CloudQuery. Since your local machine won't always be available, the next step is to schedule the sync and get it production-ready. For more information about this, head over to the CloudQuery Deployment Documentation.
Want to see CloudQuery in action? Schedule a demo with our team or check out the platform documentation to learn more.
Want help getting started? Join the CloudQuery community to connect with other users and experts, or message our team directly here if you have any questions.
Turn cloud chaos into clarity

Find out how CloudQuery can help you get clarity from a chaotic cloud environment with a personalized conversation and demo.