Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This section describes the transaction behavior of the Data Hub GCP BigQuery agent. For more information about general transactions, see, Transactions, in /wiki/spaces/MZD73/pages/5605083.

Emits

This agent does not emit anything.

Retrieves

The agent retrieves commands from other agents. Based on these commands the agent changes the state of the processed local file3.1.11 Workflow Monitor.

The GCP BigQuery batch forwarding agent uses the streaming insert API that is designed to ensure that data can be loaded at extremely high volumes and also that loaded data is available to queries in real-time. As part of the implementation that does both of these things, newly inserted data is added to a streaming buffer where it is immediately available for queries. However, this data is not moved into standard storage until more than an hour after being loaded. While the data is in the streaming buffer, it can only be queried. It cannot be updated, deleted, or copied. You will need to refer to the GCP documentation for further information on the streaming insert API and the streaming buffer.

Due to the restriction on modifying rows in the streaming buffer, the GCP Bigquery agent does not modify the Data table at commit and rollback stage. Instead, the agent adopts a related design utilizing a Transaction ID, unique for each batch.

  1. The Data table must have a Transaction ID column.
  2. The Batch Status table must be created with a Transaction ID column and a Status column.
  3. At commit and rollback stage, the Batch Status is updated with a status code reflecting the current stage that can be used for auditing.
  4. Consumers of the loaded data are expected to always access that data through a view.
  5. This view should join the two tables on Transaction ID where status = 0, for example:

    Info
    titleExample - View Joining Data Table and Batch Status Table

    The following DDL query is used in the BigQuery Query Editor to create a view under the user_analytics Dataset with the table named view1.

    Code Block
    CREATE VIEW IF NOT EXISTS user_analytics.view1 AS
    SELECT * FROM user_analytics.data_tbl1 AS t1
    FULL JOIN user_analytics.batch_status_tbl1 AS t2
    USING (id)  WHERE t2.status = 0;



Emits

The agent emits commands that changes the state of the file currently processed.

If a Rollback message is received, the agent will remove the local temporary file.

Command

Description

Begin Batch

When a Begin Batch message is received a local temporary file is created in MZ_HOME/tmp.

ConsumeWhen a consume message is received the agent will write the incoming data to the local temporary file.

Commit

When a Commit message is received, the agent will send the local temporary file to the staging directory in Impala. Once the file has been transferred, the agent will insert the transferred data into a temporary database table. The agent then copies the data in the temporary table to the selected database table and removes all temporary files and tables.

If the agent is recovering from an error, the agent will only remove the local temporary file.

Rollback

Cancel Batch

Emitted if any error occurs during insertion of rows into Data table or error when mapping the UDR to rows.

Retrieves

The agent retrieves commands from other agents and based on them generates a state change of the file currently processed.

Command

Description

Begin Batch

Retrieves a Transaction ID.

End Batch

Updates the Batch status to indicate successful insert of all the UDRs into the Data table. This status will be loaded to the Batch Status table.

Cancel Batch

Updates the Batch status to indicate UDRs insertion has been cancelled. This status will be loaded to the Batch Status table.


Scroll ignore
scroll-viewportfalse
scroll-pdftrue
scroll-officefalse
scroll-chmtrue
scroll-docbooktrue
scroll-eclipsehelptrue
scroll-epubtrue
scroll-htmlfalse


Next: