You open the GCP BigQuery agent configuration dialog from a workflow configuration: you can right-click the agent icon and select Configuration..., or double-click the agent icon
...
Field | Description |
---|
GCP Profile | Click Browse to select a predefined GCP profile. The profile contains the credential information required to connect to the BigQuery in the GCP Project. |
UDR Type | Type of UDR to populate the target BigQuery table. |
Dataset | Select the target dataset. |
Data Table | Select the target table name. Only the tables found within the selected dataset can be listed and chosen. Info |
---|
| When creating the Data table, ensure that: The table must have a Transaction ID column, dedicated to the agent's internal use for transaction safety purposes. The column could be named according to your preference however it must be BigQuery INT64 data type (with alias INT , SMALLINT , INTEGER , BIGINT , TINYINT , BYTEINT ). It must also not allow NULL . Take note of the name as you will have to create a similar column with the same name for the Batch Status table.
|
Info |
---|
| Due to the restriction on modifying rows in Data table, the Data table is not modified during Commit and Rollback workflow stages. This Restriction is explained further in 4.2 GCP BigQuery Agent Transaction Behavior. |
|
Batch Status Table | A separate table to keep track of the transaction ID to allow the agent to better provide transaction safety in the event that the workflow aborts. A transaction ID is unique to each batch process, the agent will use the transaction ID to identify UDRs that belong to the batch when the data insertion into the table was interrupted. Info |
---|
| When creating the Batch Status table, ensure that: |
See the example below on creating a Batch Status table: Info |
---|
title | Example - Batch Status Table |
---|
| The following DDL query is used in the BigQuery Query Editor to create a table under the user_analytics Dataset with the table named batch_status. Code Block |
---|
CREATE TABLE user_analytics.batch_status
(
transactionID INT64 NOT NULL OPTIONS(description="The transaction ID field"),
status INT64 OPTIONS(description="The status field"),
) |
|
Info |
---|
| The Batch Status table is updated with status code during the Commit and Rollback workflow stages. The Batch Status, Status column may contain the following codes: Code | Description |
---|
100 | The UDR's are currently being inserted into the Data table. | 0 | The workflow successfully inserted all the UDRs into the Data table. | 1 | The workflow has aborted or has been manually stopped and the UDRs insertion has been cancelled. | 2 | The Workflow entered rollback stage, possibly due to errors in previous workflow run. | -1 | This status indicates an unknown issue and requires further investigation into the cause. |
|
|
Transaction ID Column | Select the column that is designated for the transaction ID from the Batch Status Table and Data Table. The transaction ID has to be present in both Data and Batch Status table with the same name, for the transaction ID to be available for selection. |
Batch Size | The amount of UDR to be inserted into the target data table for each batch process. This corresponds to the rows per insert request into the target Data table. The default value is 500 records per batch transaction. Note |
---|
| If you need to increase the Batch Size, you will need to refer to the GCP documentation on the maximum limit of Bigquery streaming inserts. |
|
Concurrent Requests | The amount of connections the agent will open to the target data table to insert the UDRs. The default value is 1 connection. Note |
---|
| The order of insertion of the UDR data into the BigQuery table is done out of sequence and will appear as such in the target table. Sorting will have to be done from BigQuery itself. |
|
...