What's new in CloudQuery Plugin Protocol v3
Summary of changes and features in Plugin Protocol v3
Yevgeny Pats • Jul 10, 2023
We are thrilled to announce the release of the new gRPC Plugin Version v3, which brings exciting enhancements to writing CloudQuery plugins!
The gRPC protocol is the underlying protocol which enables CloudQuery plugins to be decoupled from one another, which is crucial for a system with an unlimited number of integrations.
This blog covers the gRPC changes. For language-specific changes, see Go SDK V4.
Let's begin with a high-level summary before diving deeper into each of the updates:
- Apache Arrow - protobuf V3 now fully supports Apache Arrow, which was introduced in protobuf v2 (Sources) and protobuf v1 (Destinations).
- Unified Protocol for Sources and Destinations - We have streamlined the gRPC protocol to utilize a single protocol. This allows CloudQuery plugins to function as both a source and a destination simultaneously, simplifying gRPC versioning and updates.
- Transition Sync/Write to Streaming API - We have transitioned the
Writeoperations to a streaming-based API, enabling new use cases like Change Data Capture (CDC).
We extensively discussed this update in our previous blog post. Now, with all our destinations migrated to SDK V4, they support over 30 different Apache Arrow data types!
All fields that are sending CloudQuery tables are encoded as Apache Arrow Schemas, and all fields sending data are Apache Arrow records (Fields are commented in the code).
Updates to plugin-pb are vital for introducing new features and enhancing the developer experience for plugin authors and end users.
Having a single protocol version ensures better manageability of upgrades and maintains backward compatibility, providing users and developers with ample time for transitioning. It also enables plugin authors to create plugins that function as both sources and destinations.
Moreover, any of the CloudQuery destinations can now be utilized as backends for incremental syncs. For example, you can now specify the following for sources with incremental syncs:
kind: source spec: name: aws path: cloudquery/aws registry: cloudquery version: 'VERSION_SOURCE_AWS' destinations: ['postgresql'] backend_options: table_name: 'cq_state_aws' connection: '@@plugins.postgresql.connection' spec: ... --- kind: destination spec: name: postgresql path: cloudquery/postgresql registry: cloudquery version: 'VERSION_DESTINATION_POSTGRESQL' spec: connection_string: ....
@@plugins.Xcan reference any of the destinations specified in your config.
Lastly, this also opens the door for more easily creating a CloudQuery SDK for new languages, so stay tuned!
One noteworthy use case on top of CloudQuery is Change Data Capture (CDC), which involves operations like creating or deleting tables during a sync.
With the new streaming support, the
Writemethod now supports three messages:
MessageMigrateTable- Migrate table to a specific target table safely or force (dropping and re-creating)
MessageInsert- Insert Arrow Record
MessageDeleteStale- This is a specific message that might be generalized later to a more general delete message. This indicates a plugin to delete data from a table with
With this change to streaming that contains messages, we can easily extend the protocol in a backward-compatible way to support more messages like
DeleteRecordto support use cases such as CDC or other use cases that require those kinds of data updates.
Multiple messages, such as
CreateTable, can easily be accommodated to enhance and facilitate such use cases. All available messages are available here.
To learn how to migrate Go plugins, please take a look at Go SDK V4 Update.