hubspot
solutions
transformations

Data-Driven Decision Making - Using CloudQuery to Export and Analyze HubSpot Data

Kevin Rheinheimer

Kevin Rheinheimer •

At CloudQuery, we’re all about giving you granular access to your data in an environment or database you are comfortable using. With CloudQuery, you can export your HubSpot data to your preferred database. Using CloudQuery to extract and load data from HubSpot gives you complete control over how you use your CRM data. Access to your raw HubSpot CRM data in your database means it can be incorporated into your already existing analytics suite. Once you have your HubSpot data in your database, you can analyze it, combine it with other data, and transform it into actionable data in your database schema.
Once your data has been loaded into your database, you will use the following tables created by CloudQuery's data movement tool: hubspot_crm_companies, hubspot_crm_contacts, and hubspot_crm_deals. These tables represent organizations, individual contacts, and sales deals in your HubSpot CRM, respectively. Each of these tables has JSON fields titled associations and properties, which make it difficult to easily join associated tables and gain descriptive information about these entities in your HubSpot data without SQL expertise.

How to load your HubSpot data to your database #

  1. Download CloudQuery here
  2. Configure your HubSpot source and your database destination
  3. The following queries can be used to create views or tables that allow you to flatten these JSON fields into a columnar structure.

Parsing HubSpot associations data with PostgreSQL #

Now that you have exported your HubSpot data to your database let’s create a reference dataset of associations using the hubspot_crm_companies table:
select
	_cq_sync_time,
	_cq_source_name,
	_cq_id,
	_cq_parent_id,
	id as company_id,
	contact ->> 'id' as contact_id,
	contact ->> 'type' as association_type
from
	hubspot_crm_companies,
	json_array_elements(associations -> 'contacts' -> 'results') as contact
Because this table focuses on HubSpot companies, you can alias the id column as company_id to avoid confusion. You can then leverage PostgreSQL JSON functions to isolate contacts that are associated with companies and parse the ensuing id as contact_id, as well as the type of association from the original JSON column.
Next, let’s create an associations reference dataset from the hubspot_crm_contacts table:
select
	_cq_sync_time,
	_cq_source_name,
	_cq_id,
	_cq_parent_id,
	id as contact_id,
	company ->> 'id' as company_id,
	company ->> 'type' as association_type
from
	hubspot_crm_contacts,
	json_array_elements(associations -> 'companies' -> 'results') as company
Just as above in the companies example you can alias the id from the hubspot_crm_contacts table as contact_id and the id from the associations JSON column as company_id to avoid confusion.
Finally, let’s create an associations dataset for the hubspot_crm_deals table:
select
	_cq_sync_time,
	_cq_source_name,
	_cq_id,
	_cq_parent_id,
	id as deal_id,
	company ->> 'id' as company_id,
	company ->> 'type' as association_type
from
	hubspot_crm_deals,
	json_array_elements(associations -> 'companies' -> 'results') as company
The same logic applies to this table. As deals are tied to companies in HubSpot you will want to separate the id field from the hubspot_crm_deals table and the id value from the associations JSON field by aliasing them as deal_id and company_id, respectively.

Parsing HubSpot properties data with PostgreSQL #

The other JSON column provided by HubSpot is the properties column. In the hubspot_crm_companies table, this column contains descriptive information about companies, such as their industry, country, or any custom properties you may want to add to your HubSpot CRM.
Here’s an example you can use to flatten company properties into columns using PostgreSQL:
select
	_cq_sync_time,
	_cq_source_name,
	_cq_id,
	_cq_parent_id,
	id,
	properties ->> 'country' as country,
	(properties ->> 'createdate')::timestamp as created_date,
	properties ->> 'description' as description,
	properties ->> 'domain' as domain,
	(properties ->> 'hs_lastmodifieddate')::timestamp as hs_last_modified_date,
	properties ->> 'hs_object_id' as hs_object_id,
	properties ->> 'industry' as industry
from
	hubspot_crm_companies
Next is an example of parsing individual HubSpot contact properties:
select
	_cq_sync_time,
	_cq_source_name,
	_cq_id,
	_cq_parent_id,
	id,
	(properties ->> 'createdate')::timestamp as created_date,
	properties ->> 'email' as email,
	properties ->> 'firstname' as first_name,
	properties ->> 'lastname' as last_name,
	properties ->> 'hs_object_id' as hs_object_id,
	(properties ->> 'lastmodifieddate')::timestamp as last_modified_date
from
	hubspot_crm_contacts
Finally, here is an example of parsing HubSpot deal properties:
select
	_cq_sync_time,
	_cq_source_name,
	_cq_id,
	_cq_parent_id,
	id,
	properties ->> 'amount' as amount,
	(properties ->> 'closedate')::timestamp as close_date,
	(properties ->> 'createdate')::timestamp as created_date,
	properties ->> 'dealname' as deal_name,
	properties ->> 'dealstage' as deal_stage,
	(properties ->> 'lastmodifieddate')::timestamp as last_modified_date,
	properties ->> 'hs_object_id' as hs_object_id,
	properties ->> 'pipeline' as pipeline
from
	hubspot_crm_deals

Parsing HubSpot associations data with BigQuery #

Here, you can repeat the above steps, but the queries will use BigQuery SQL syntax. These queries will leverage BigQuery JSON functions. Note: BigQuery is ‘append-only’, preferred by design, so you can leverage the _cq_sync_time field to select only the most recent synced data. As with the above PostgreSQL examples, you can use the following query to create the same company associations dataset using BigQuery syntax:
select
	_cq_sync_time,
	_cq_source_name,
	_cq_id,
	_cq_parent_id,
	id as company_id,
	json_value(contact, '$.id') as contact_id,
	json_value(contact, '$.type') as association_type
from
	your_bq_dataset.hubspot_crm_companies,
	unnest(json_extract_array(associations, '$.contacts.results')) as contact
where
	_cq_sync_time = (select max(_cq_sync_time) from your_bq_dataset.hubspot_crm_companies)
Next is the associations dataset using the HubSpot contacts table:
select
	_cq_sync_time,
	_cq_source_name,
	_cq_id,
	_cq_parent_id,
	id as contact_id,
	json_value(company, '$.id') as company_id,
	json_value(company, '$.type') as association_type
from
	your_bq_dataset.hubspot_crm_contacts,
	unnest(json_extract_array(associations, '$.companies.results')) as company
where
	_cq_sync_time = (select max(_cq_sync_time) from your_bq_dataset.hubspot_crm_contacts)
Finally, the associations' dataset using the HubSpot deals table:
select
	_cq_sync_time,
	_cq_source_name,
	_cq_id,
	_cq_parent_id,
	id as deal_id,
	json_value(company, '$.id') as company_id,
	json_value(company, '$.type') as association_type
from
	your_bq_dataset.hubspot_crm_deals,
	unnest(json_extract_array(associations, '$.companies.results')) as company
where
	_cq_sync_time = (select max(_cq_sync_time) from your_bq_dataset.hubspot_crm_deals)

Parsing HubSpot properties data with BigQuery #

The same criteria applies to the below queries, you can use BigQuery’s JSON functions to parse HubSpot data to columns. You can use the below query to create columns from the HubSpot companies properties column for a view or new table:
select
	_cq_sync_time,
	_cq_source_name,
	_cq_id,
	_cq_parent_id,
	id,
	json_value(properties, '$.country') as country,
	cast(json_value(properties, '$.createdate') as timestamp) as created_date,
	json_value(properties, '$.description') as description,
	json_value(properties, '$.domain') as domain,
	cast(json_value(properties, '$.hs_lastmodifieddate') as timestamp) as hs_last_modified_date,
	json_value(properties, '$.hs_object_id') as hs_object_id,
	json_value(properties, '$.industry') as industry
from
	your_bq_dataset.hubspot_crm_companies
where
	_cq_sync_time = (select max(_cq_sync_time) from your_bq_dataset.hubspot_crm_companies)
Next is an example you can use to flatten HubSpot contact properties to columns:
select
	_cq_sync_time,
	_cq_source_name,
	_cq_id,
	_cq_parent_id,
	id,
	cast(json_value(properties, '$.createdate') as timestamp) as created_date,
	json_value(properties, '$.email') as email,
	json_value(properties, '$.firstname') as first_name,
	json_value(properties, '$.lastname') as last_name,
	json_value(properties, '$.hs_object_id') as hs_object_id,
	cast(json_value(properties, '$.lastmodifieddate') as timestamp) as last_modified_date
from
	your_bq_dataset.hubspot_crm_contacts
where
	_cq_sync_time = (select max(_cq_sync_time) from your_bq_dataset.hubspot_crm_contacts)
And last is an example of parsed HubSpot deals properties:
select
	_cq_sync_time,
	_cq_source_name,
	_cq_id,
	_cq_parent_id,
	id,
	json_value(properties, '$.amount') as amount,
	cast(json_value(properties, '$.closedate') as timestamp) as close_date,
	cast(json_value(properties, '$.createdate') as timestamp) as created_date,
	json_value(properties, '$.dealname') as deal_name,
	json_value(properties, '$.dealstage') as deal_stage,
	cast(json_value(properties, '$.hs_lastmodifieddate') as timestamp) as last_modified_date,
	json_value(properties, '$.hs_object_id') as hs_object_id,
	json_value(properties, '$.pipeline') as pipeline
from
	your_bq_dataset.hubspot_crm_deals
where
	_cq_sync_time = (select max(_cq_sync_time) from your_bq_dataset.hubspot_crm_deals)
If you need to gain insights from or visualize your HubSpot CRM data quickly, CloudQuery lets you fully control the process. Download CloudQuery to gain immediate insights into your customer relations and sales pipelines. If you have any questions or want to connect with our engineering team, contact us or join our Community.
Start your free trial today

Experience Simple, Fast and Extensible Data Movement.