Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Current »

This section describes the transaction behavior of the GCP BigQuery agent. For more information about general transactions, see, Transactions, in Workflow Monitor (3.0).

The GCP BigQuery batch forwarding agent uses the streaming insert API that is designed to ensure that data can be loaded at extremely high volumes and also that loaded data is available to queries in real-time. As part of the implementation that does both of these things, newly inserted data is added to a streaming buffer where it is immediately available for queries. However, this data is not moved into standard storage until more than an hour after being loaded. While the data is in the streaming buffer, it can only be queried. It cannot be updated, deleted, or copied. You will need to refer to the GCP documentation for further information on the streaming insert API and the streaming buffer.

Due to the restriction on modifying rows in the streaming buffer, the GCP Bigquery agent does not modify the Data table at commit and rollback stage. Instead, the agent adopts a related design utilizing a Transaction ID, unique for each batch.

  1. The Data table must have a Transaction ID column.
  2. The Batch Status table must be created with a Transaction ID column and a Status column.
  3. At commit and rollback stage, the Batch Status is updated with a status code reflecting the current stage that can be used for auditing.
  4. Consumers of the loaded data are expected to always access that data through a view.
  5. This view should join the two tables on Transaction ID where status = 0, for example:

    Example - View Joining Data Table and Batch Status Table

    The following DDL query is used in the BigQuery Query Editor to create a view under the user_analytics Dataset with the table named view1.

    CREATE VIEW IF NOT EXISTS user_analytics.view1 AS
    SELECT * FROM user_analytics.data_tbl1 AS t1
    FULL JOIN user_analytics.batch_status_tbl1 AS t2
    USING (id)  WHERE t2.status = 0;

Emits

The agent emits commands that changes the state of the file currently processed.

Command

Description

Cancel Batch

Emitted if any error occurs during insertion of rows into Data table or error when mapping the UDR to rows.

Retrieves

The agent retrieves commands from other agents and based on them generates a state change of the file currently processed.

Command

Description

Begin Batch

Retrieves a Transaction ID.

End Batch

Updates the Batch status to indicate successful insert of all the UDRs into the Data table. This status will be loaded to the Batch Status table.

Cancel Batch

Updates the Batch status to indicate UDRs insertion has been cancelled. This status will be loaded to the Batch Status table.

  • No labels