Array expansion

Overview

Array expansion lets you take a column that contains an array of values and expand each element into its own row during a sync. This is useful when your source data stores multiple identifiers or values in a single array column, but your destination expects one record per value.

For example, if a user has multiple device IDs stored in an array column, array expansion creates a separate sync row for each device ID, allowing you to sync each one individually to your destination.

Array expansion is currently supported with the lightning sync engine on Postgres, Snowflake, BigQuery, and Databricks sources.

How it works

When you map an array-type column as the identifier in your sync configuration, Hightouch automatically enables array expansion. Each element of the array becomes a separate row, and a generated column is used as the identifier value sent to the destination.

Example

Consider this source table with an array column device_ids:

user_id	name	device_ids
1	Alice	`["ios_a1", "android_b2"]`
2	Bob	`["ios_c3"]`
3	Charlie	`[]`
4	Dana	`["web_d4", "web_d4"]`

When you map device_ids as the identifier for your sync, Hightouch expands each row so that every array element becomes its own record:

_ht_device_ids_element	user_id	name
`ios_a1`	1	Alice
`android_b2`	1	Alice
`ios_c3`	2	Bob
`web_d4`	4	Dana

The generated column _ht_device_ids_element contains the individual element value and is used as the identifier sent to the destination.

Empty arrays

Rows where the array column is empty ([]) or NULL produce zero output rows. In the example above, Charlie's row is omitted entirely because device_ids is an empty array.

Duplicate elements

If an array contains duplicate values (like Dana's two "web_d4" entries), the duplicates produce a primary key collision. The sync completes with a "Report" status and the duplicate rows are flagged as rejected. Only one row per unique element value is synced.

Configuration

Array expansion is configured automatically when you map an array-type column as the identifier in the "How should rows be matched?" section of your sync configuration.

When an array column is selected:

Hightouch sets the arrayExpansionSourceColumn in your sync config
The identifier mapping is automatically translated to use the generated element column (_ht_{column}_element)
An info banner confirms that array expansion is active

To disable array expansion, change the identifier mapping to a non-array column.

Supported column types

Array expansion works with columns of the following types:

Native arrays: Postgres text[], Snowflake ARRAY, BigQuery ARRAY<STRING>, Databricks ARRAY<STRING>
JSON arrays: Postgres jsonb, Snowflake VARIANT, BigQuery JSON
JSON string arrays: VARCHAR columns containing JSON-encoded arrays like '["a","b"]'

Hightouch automatically detects the column type and applies the appropriate expansion strategy.

Limitations

Only one array column can be expanded per sync
Array expansion requires the lightning sync engine (in-warehouse planner)
Only supported on Postgres, Snowflake, BigQuery, and Databricks sources
Duplicate values within an array cause row-level rejections
Array expansion is not available on syncs that use Match Booster, as Match Booster applies its own identity resolution logic that is incompatible with row expansion