Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This section describes the transaction behavior of the Data Hub GCP BigQuery agent. For more information about general transactions, see, Transactions, in /wiki/spaces/MZD73/pages/5605083.

Emits

This agent does not emit anything3.1.11 Workflow Monitor.

The GCP BigQuery batch forwarding agent uses the streaming insert API that is designed to ensure that data can be loaded at extremely high volumes and also that loaded data is available to queries in real-time. As part of the implementation that does both of these things, newly inserted data is added to a streaming buffer where it is immediately available for queries. However, this data is not moved into standard storage until more than an hour after being loaded. While the data is in the streaming buffer, it can only be queried. It cannot be updated, deleted, or copied. You will need to refer to the GCP documentation for further information on the streaming insert API and the streaming buffer.

Due to the restriction on modifying rows in the streaming buffer, the GCP Bigquery agent does not modify the Data table at commit and rollback stage. Instead, the agent adopts a related design utilizing a Transaction ID, unique for each batch.

  1. The Data table must have a Transaction ID column.
  2. The Batch Status table must be created with a Transaction ID column and a Status column.
  3. At commit and rollback stage, the Batch Status is updated with a status code reflecting the current stage that can be used for auditing.
  4. Consumers of the loaded data are expected to always access that data through a view.
  5. This view should join the two tables on Transaction ID where status = 0, for example:

    Info
    titleExample - View Joining Data Table and Batch Status Table

    The following DDL query is used in the BigQuery Query Editor to create a view under the user_analytics Dataset with the table named view1.

    Code Block
    CREATE VIEW IF NOT EXISTS user_analytics.view1 AS
    SELECT * FROM user_analytics.data_tbl1 AS t1
    FULL JOIN user_analytics.batch_status_tbl1 AS t2
    USING (id)  WHERE t2.status = 0;



Emits

The agent emits commands that changes the state of the file currently processed.

Command

Description

Cancel Batch

Emitted if any error occurs during insertion of rows into Data table or error when mapping the UDR to rows.

Retrieves

The agent retrieves commands from other agents and based on them generates a state change of the file currently processed.

Command

Description

Begin Batch

Retrieves a Transaction ID

and inserts an entry in the pending transaction table

.

End Batch

Deletes the pending Transaction ID row

Updates the Batch status to indicate successful insert of all the UDRs into the Data table. This status will be loaded to the Batch Status table.

Cancel Batch

Removes the distributed rows with the current Transaction ID or calls the configured Cleanup SP. The pending Transaction ID row is deleted.

Updates the Batch status to indicate UDRs insertion has been cancelled. This status will be loaded to the Batch Status table.


Scroll ignore
scroll-viewportfalse
scroll-pdftrue
scroll-officefalse
scroll-chmtrue
scroll-docbooktrue
scroll-eclipsehelptrue
scroll-epubtrue
scroll-htmlfalse


Next: