GCP BigQuery Agent Transaction Behavior

This section describes the transaction behavior of the GCP BigQuery agent. For more information about general transactions, see, Transactions, in Workflow Monitor.

The GCP BigQuery batch forwarding agent uses the streaming insert API that is designed to ensure that data can be loaded at extremely high volumes and also that loaded data is available to queries in real-time. As part of the implementation that does both of these things, newly inserted data is added to a streaming buffer where it is immediately available for queries. However, this data is not moved into standard storage until more than an hour after being loaded. While the data is in the streaming buffer, it can only be queried. It cannot be updated, deleted, or copied. You will need to refer to the GCP documentation for further information on the streaming insert API and the streaming buffer.

Due to the restriction on modifying rows in the streaming buffer, the GCP Bigquery agent does not modify the Data table at commit and rollback stage. Instead, the agent adopts a related design utilizing a Transaction ID, unique for each batch.

  1. The Data table must have a Transaction ID column.
  2. The Batch Status table must be created with a Transaction ID column and a Status column.
  3. At commit and rollback stage, the Batch Status is updated with a status code reflecting the current stage that can be used for auditing.
  4. Consumers of the loaded data are expected to always access that data through a view.

This view should join the two tables on Transaction ID where status = 0, for example:

Example - View Joining Data Table and Batch Status Table

The following DDL query is used in the BigQuery Query Editor to create a view under the user_analytics Dataset with the table named view1.

CREATE VIEW IF NOT EXISTS user_analytics.view1 AS
SELECT * FROM user_analytics.data_tbl1 AS t1
FULL JOIN user_analytics.batch_status_tbl1 AS t2
USING (id)  WHERE t2.status = 0;