Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


The Data Hub forwarding agent is a batch forwarding agent that bulk loads data to an Impala database specified by a Data Hub profile. 

Configuration

Image Removed

Data Hub agent configurationYou open the GCP BigQuery agent configuration dialog from a workflow configuration: you can right-click the agent icon and select Configuration..., or double-click the agent icon

Target

The Target tab contains settings related to the UDR type to insert into the database and the specific table for the UDR to be inserted into by the agent.

Image Added

GCP BigQuery agent configuration with the Target tab selected.

Select the target database table from the drop-down list. If a table does not appear in the list, make sure that it has been mapped to a UDR type in the selected Data Hub profile
FieldDescription
GCP ProfileClick Browse to to select a predefined Data Hub GCP profile. The profile contains the settings required to connect to the database.Table Namethe BigQuery in the GCP Project.
UDR Type

Type of UDR to populate the target BigQuery table.

DatasetSelect the target dataset.
Data Table

Select the target table name. Only the tables found within the selected dataset can be listed and chosen.

Info
titleInfo!

When creating the target table, ensure that:

  • The table must have a Transaction ID column, dedicated to the agent's internal use for transaction safety purposes. The column could be named arbitrary however it must be INTEGER. It must also not allow NULL.


Batch Status TableA separate table to keep track of the transaction ID to allow the agent to better provide transaction safety in the event that the workflow aborts. A transaction ID is unique to each batch process, the agent will use the transaction ID to identify UDRs that belong to the batch when the data insertion into the table was interrupted.
Transaction ID ColumnSelect the column that is designated for the transaction ID from the Batch Status Table.
Batch SizeThe amount of UDR to be inserted into the target data table for each batch process. The default value is 500 records per batch transaction.
Concurrent RequestsThe amount of connections the agent will open to the target data table to insert the UDRs.


Assignment

The Assignment tab contains the assignment of values to each column.

Image Added

GCP BigQuery agent configuration with the Assignment tab selected.

If the Target tab is correctly configured with the Dataset, Data Table, Batch Status Table and Transaction ID Column filed, and the Assignment tab is selected, the table will automatically be populated. If assignments already exist in the Assignment tab, then Refresh must be manually selected, for the assignments to be updated with the configurations in the Target tab.

FieldDescription
Refresh

Updates the table with all the columns from the selected table.

Note
titleNote!
Potential changes in the database table will not be visible until the Dataset, Data Table, Batch Status Table and Transaction ID Column have been properly configured.

If rows already exist in the table, the refresh operation preserves the configuration for all rows with a corresponding column. Thus, if a table has been extended with a new column, the old column configurations are left untouched and the new column appears when  Refresh is selected.

Field Name

Displays a list of all columns for the selected table, except the Transaction ID column.

Field Type

Displays the data type for each column as declared in the database table.

UDR FieldAllows you to map the Field Name of the database table with the UDR Field from the UDR Type configured from the Target tab.