The Duplicate Check feature stores the collected URLs in an external database pointed out by a Database profile. The schema of this database must contain a table definition that matches the needs of the agent.
Table and Column Names
The schema table name must be "duplicate_check". It must contain all the columns from this table:
Table column | Description |
txn | The transaction id of the batch that collected the URL (in the case the file is split into several chunks using hintEndBatch, it is the last and final transaction id.) |
tstamp | The timestamp when the URL was committed by the workflow. |
workflow_key | A uniquely identifying id of the workflow collecting the URL. It allows workflows to be renamed without changing the table data. |
url | The full absolute URL collected. |
Column Types
The column types are defined by how the specific JDBC driver converts JDBC types to the database.
The
txn
column is a JDBC VARCHAR.The
tstamp
column is a JDBC TIMESTAMP type.The
workflow_key
andurl
columns are of JDBC VARCHAR type.
Oracle Example
Oracle Example
<![CDATA[-- Table definition usable for ORACLE CREATE TABLE duplicate_check( txn long, tstamp timestamp, workflow_key varchar2(32), url varchar2(256) ); ]]>
Add Comment