Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

To configure the GCP BigQuery Agent, double-click on the agent or select the Edit agent Image Addedbutton after left-clicking on the agent in the Workflow Editor. After opening the Agent Configuration, you can set a name for the agent using the Name field.

Target

The Target tab contains settings related to the UDR type to insert into the database and the specific table for the UDR to be inserted into by the agent.

...

GCP BigQuery agent configuration - Target tab

Field

Description

GCP Profile

Click Browse to select a predefined GCP profile. The profile contains the credential information required to connect to BigQuery in the GCP Project.

UDR Type

Type of UDR to populate the target BigQuery table.

Project ID

Specify the reference ID of the project. This is a mandatory field as the workflow will be invalid if you try to save the configuration without specifying the Project ID.

Dataset

Select the target dataset.

Data Table

Select the target table name. Only the tables found within the selected dataset can be listed and chosen.

Info

Info!

When creating the Data table, ensure that:

  • The table must have a Transaction ID column, dedicated to the agent's internal use for transaction safety purposes. The column could be named according to your preference  however it must be BigQuery INT64 data type (with alias INT, SMALLINT, INTEGER, BIGINT, TINYINT, BYTEINT). It must also not allow NULL. Take note of the name as you will have to create a similar column with the same name for the Batch Status table.

Info

Info!

Due to the restriction on modifying rows in Data table, the Data table is not modified during Commit and Rollback workflow stages. This Restriction is explained further in GCP BigQuery Agent Transaction Behavior.

Batch Status Table

A separate table to keep track of the transaction ID to allow the agent to better provide transaction safety in the event that the workflow aborts. A transaction ID is unique to each batch process, the agent will use the transaction ID to identify UDRs that belong to the batch when the data insertion into the table was interrupted.

Info

Info!

When creating the Batch Status table, ensure that:

  • The table must have a Transaction ID column, dedicated to the agent's internal use for transaction safety purposes. The column must be named the same as the Transaction ID column created in the Data table and it must be in INT64. It must also not allow NULL.

  • The table must also have a Status column for the agent to update the status of the transaction. This column must also be BigQuery INT64 data type (with alias INT, SMALLINT, INTEGER, BIGINT, TINYINT, BYTEINT) and must be named status.

See the example below on creating a Batch Status table:

Info

Example - Batch Status Table

The following DDL query is used in the BigQuery Query Editor to create a table under the user_analytics Dataset with the table named batch_status.

Code Block
 CREATE TABLE user_analytics.batch_status
 (
   transactionID INT64 NOT NULL OPTIONS(description="The transaction ID field"),
   status INT64 OPTIONS(description="The status field"),
 )


Info

Info!

The Batch Status table is updated with status code during the Commit and Rollback workflow stages. The Batch Status, Status column may contain the following codes:

CodeDescription
100The UDR's are currently being inserted into the Data table.
0The workflow successfully inserted all the UDRs into the Data table.
1The workflow has aborted or has been manually stopped and the UDRs insertion has been cancelled.
2The Workflow entered rollback stage, possibly due to errors in previous workflow run.
-1This status indicates an unknown issue and requires further investigation into the cause.
Info


Transaction ID Column

Select the column that is designated for the transaction ID from the Batch Status Table and Data Table. The transaction ID has to be present in both Data and Batch Status table with the same name, for the transaction ID to be available for selection.

Batch Size

The amount of UDR to be inserted into the target data table for each batch process. This corresponds to the rows per insert request into the target Data table. The default value is 500 records per batch transaction.

Note

Note!

If you need to increase the Batch Size, you will need to refer to the GCP documentation on the maximum limit of Bigquery streaming inserts. 

Concurrent Requests

The amount of connections the agent will open to the target data table to insert the UDRs. The default value is 1 connection.

Note

Note!

The order of insertion of the UDR data into the BigQuery table is done out of sequence and will appear as such in the target table. Sorting will have to be done from BigQuery itself.


Assignment

The Assignment tab contains the assignment of values to each column.

...

GCP BigQuery agent configuration with the Assignment tab selected.

If the Target tab is correctly configured with the Dataset, Data Table, Batch Status Table and Transaction ID Column filed, and the Assignment tab is selected, the Field Name and Field Type column in the table will automatically be populated. If assignments already exist in the Assignment tab, then Refresh must be manually selected, for the assignments to be updated with the configurations in the Target tab.

Field

Description

Refresh

Updates the table with all the columns from the selected table.

Note

Note!

Potential changes in the database table will not be visible until the Dataset, Data Table, Batch Status Table and Transaction ID Column have been properly configured.

If rows already exist in the table, the refresh operation preserves the configuration for all rows with a corresponding column. Thus, if a table has been extended with a new column, the old column configurations are left untouched and the new column appears when  Refresh is selected.

Field Name

Displays a list of all columns for the selected Data table, except the Transaction ID column.

Field Type

Displays the BigQuery data type for each column as declared in the BigQuery table.

Info

Info!

BigQuery data types that are currently supported are: BIGNUMERIC, BOOLEAN, DATE, DATETIME, TIME, TIMESTAMP, FLOAT64, NUMERIC, GEOGRAPHY, STRING and INT64.

UDR Field

Allows you to map the Field Name of the Data table with the UDR Field from the UDR Type configured from the Target tab. Only valid UDR Field types will be available for selection in the Column Assignment dialog based on the Field Type of the BigQuery Column. You can refer to the BigQuery to valid UDR data types mappings in the table below.

BigQuery data types are automatically mapped to UDR data types as follows:

BigQuery Data Types

UDR Data Types

BIGNUMERIC

byte, int, short, long, float, double, bigint, bigdec

BOOLEAN

boolean

DATE

date

DATETIME

date

TIME

date

TIMESTAMP

date

FLOAT64

byte, int, short, long, float, double

NUMERIC

byte, int, short, long, float, double

GEOGRAPHY

string

STRING

string

INT64

byte, int, short, long