CloudQuery is an open-source data integration platform that allows you to export data from any source to any destination.
The CloudQuery Google Analytics plugin allows you to sync data from Google Analytics to any destination, including S3. It's free, open source, requires no account, and takes only minutes to get started.
Ready? Let's dive right in!
Step 1. Install the CloudQuery CLI
The CloudQuery CLI is a command-line tool that runs the sync. It supports MacOS, Linux and Windows.
Step 2. Configure the Google Analytics source plugin
Create a configuration file for the Google Analytics plugin and set up authentication.
Create a file called googleanalytics.yaml and add the following contents:
Create a configuration file for the S3 plugin and set up authentication.
Create a file called s3.yaml and add the following contents:
Fine-tune this configuration to match your needs. For more information, see the S3 Plugin ↗ page in the docs.
Step 4. Start the Sync
Run the following command in your terminal to start the sync
And away we go! 🚀 The sync will run until completion, fetching all selected tables from Google Analytics. Any errors will be logged to a file called cloudquery.log.
Now that you've seen the basics of syncing Google Analytics to S3, you should know that there's a lot more you can do. Check out the CloudQuery Documentation, Source Code and How-to Guides for more details.
This example uses the parquet format, to create parquet files in s3://bucket_name/path/to/files, with each table placed in its own directory.
Note that the S3 plugin only supports append write-mode. The (top level) spec section is described in the Destination Spec Reference.
The plugin needs to be authenticated with your account(s) in order to sync information from your cloud setup.
The plugin requires only PutObject permissions (we will never make any changes to your cloud setup), so, following the principle of least privilege, it's recommended to grant it PutObject permissions.
There are multiple ways to authenticate with AWS, and the plugin respects the AWS credential provider chain. This means that CloudQuery will follow the following priorities when attempting to authenticate:
The AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN environment variables.
The credentials and config files in ~/.aws (the credentials file takes priority).
CloudQuery can use the credentials from the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and
AWS_SESSION_TOKEN environment variables (AWS_SESSION_TOKEN can be optional for some accounts). For information on obtaining credentials, see the
AWS guide (opens in a new tab).
To export the environment variables (On Linux/Mac - similar for Windows):
The plugin can use credentials from your credentials and config files in the .aws directory in your home folder.
The contents of these files are practically interchangeable, but CloudQuery will prioritize credentials in the credentials file.
Then, you can either export the AWS_PROFILE environment variable (On Linux/Mac, similar for Windows):
IAM Roles for AWS Compute Resources
The plugin can use IAM roles for AWS compute resources (including EC2 instances, Fargate and ECS containers).
If you configured your AWS compute resources with IAM, the plugin will use these roles automatically.
For more information on configuring IAM, see the AWS docs here (opens in a new tab) and here (opens in a new tab).
User Credentials with MFA
In order to leverage IAM User credentials with MFA, the STS "get-session-token" command may be used with the IAM User's long-term security credentials (Access Key and Secret Access Key). For more information, see here (opens in a new tab).
If you are using a custom S3 endpoint, you can specify it using the endpoint spec option. If you're using authentication, the region option in the spec determines the signing region used.
kind:source# Common source-plugin configurationspec:name:googleanalyticspath:cloudquery/googleanalyticsversion:"v3.0.9"tables:["*"]destinations:["s3"]# Google Analytics specific configurationspec:property_id:"<YOUR_PROPERTY_ID_HERE>"oauth:access_token:"<YOUR_OAUTH_ACCESS_TOKEN>"reports:-name:exampledimensions:-date-language-country-city-browser-operatingSystem-year-month-hourmetrics:-name:totalUsers-name:new_usersexpression:newUsers-name:new_users2expression:"newUsers + totalUsers"invisible:truekeep_empty_rows:true
Google Cloud services that support attaching a service account
Services such as Compute Engine, App Engine and functions supporting attaching a user-managed service account which will CloudQuery will be able to utilize.
You can find out more here (opens in a new tab).
On-premises or another cloud provider
The suggested way is to use Workload identity federation (opens in a new tab).
If not available, you can use service account keys and export the location of the key via GOOGLE_APPLICATION_CREDENTIALS.
This is not recommended as long-lived keys present a security risk.