HTTP Batch Appendix - Database Requirements for Duplicate Check
The Duplicate Check feature stores the collected URLs in an external database pointed out by a Database profile. The schema of this database must contain a table definition that matches the needs of the agent.
Table and Column Names
The schema table name must be "duplicate_check". It must contain all the columns from this table:
Table column | Description |
txn | The transaction id of the batch that collected the URL (in the case the file is split into several chunks using hintEndBatch, it is the last and final transaction id.) |
tstamp | The timestamp when the URL was committed by the workflow. |
workflow_key | A uniquely identifying id of the workflow collecting the URL. It allows workflows to be renamed without changing the table data. |
url | The full absolute URL collected. |
Column Types
The column types are defined by how the specific JDBC driver converts JDBC types to the database.
TheÂ
txn
 column is a JDBC VARCHAR.TheÂ
tstamp
 column is a JDBC TIMESTAMP type.TheÂ
workflow_key
 andÂurl
 columns are of JDBC VARCHAR type.
Oracle Example
Oracle Example
<![CDATA[-- Table definition usable for ORACLE
CREATE TABLE duplicate_check(
txn long,
tstamp timestamp,
workflow_key varchar2(32),
url varchar2(256)
);
]]>