Report an issue
Back to plugin list
gcs
Official

Google Cloud Storage

This destination plugin lets you sync data from a CloudQuery source to remote GCS (Google Cloud Storage) storage in various formats such as CSV, JSON and Parquet

Publisher

cloudquery

Repositorygithub.com
Latest version

v3.6.2

Type

Destination

Platforms
Date Published

Mar 12, 2024

Price

Free

Overview

GCS (Google Cloud Storage) Destination Plugin

This destination plugin lets you sync data from a CloudQuery source to remote GCS (Google Cloud Storage) storage in various formats such as CSV, JSON and Parquet.
This is useful in various use-cases, especially in data lakes where you can query the data direct from Athena or load it to various data warehouses such as BigQuery, RedShift, Snowflake and others.

Example

This example configures a GCS destination, to create CSV files in gcs://bucket_name/path/to/files.
kind: destination
spec:
  name: "gcs"
  path: "cloudquery/gcs"
  registry: "cloudquery"
  version: "v3.6.2"
  write_mode: "append"
  spec:
    bucket: "bucket_name"
    path: "path/to/files"
    format: "csv" # options: parquet, json, csv
    format_spec:
      # CSV-specific parameters:
      # delimiter: ","
      # skip_header: false

    # Optional parameters
    # compression: "" # options: gzip
    # no_rotate: false
    # batch_size: 10000
    # batch_size_bytes: 52428800 # 50 MiB
    # batch_timeout: 30s
Note that the GCS plugin only supports append write_mode. The (top level) spec section is described in the Destination Spec Reference.
The GCS destination utilizes batching, and supports batch_size, batch_size_bytes and batch_timeout options (see below).

GCS Spec

This is the (nested) spec used by the CSV destination Plugin.
  • bucket (string) (required)
    Bucket where to sync the files.
  • path (string) (required)
    Path to where the files will be uploaded in the above bucket.
  • format (string) (required)
    Format of the output file. Supported values are csv, json and parquet.
  • format_spec (format_spec) (optional)
    Optional parameters to change the format of the file.
  • compression (string) (optional) (default: empty)
    Compression algorithm to use. Supported values are empty or gzip. Not supported for parquet format.
  • no_rotate (boolean) (optional) (default: false)
    If set to true, the plugin will write to one file per table. Otherwise, for every batch a new file will be created with a different .<UUID> suffix.
  • batch_size (integer) (optional) (default: 10000)
    Number of records to write before starting a new object.
  • batch_size_bytes (integer) (optional) (default: 52428800 (50 MiB))
    Number of bytes (as Arrow buffer size) to write before starting a new object.
  • batch_timeout (duration) (optional) (default: 30s (30 seconds))
    Maximum interval between batch writes.

format_spec

  • delimiter (string) (optional) (default: ,)
    Character that will be used as want to use as the delimiter if the format type is csv.
  • skip_header (boolean) (optional) (default: false)
    Specifies if the first line of a file should be the headers (when format is csv).

Authentication

The GCS plugin authenticates using your Application Default Credentials. Available options are all the same options described here in detail:
Local Environment:
  • gcloud auth application-default login (recommended when running locally)
Google Cloud cloud-based development environment:
  • When you run on Cloud Shell or Cloud Code credentials are already available.
Google Cloud containerized environment:
  • Services such as Compute Engine, App Engine and functions supporting attaching a user-managed service account which will CloudQuery will be able to utilize.
On-premises or another cloud provider
  • The suggested way is to use Workload identity federation
  • If not available you can always use service account keys and export the location of the key via GOOGLE_APPLICATION_CREDENTIALS. (Not recommended as long-lived keys are a security risk)


Subscribe to product updates

Be the first to know about new features.