Connection

HTTP batch collection agent configuration dialog - Connection tab

Field	Description
URL	URL to the file that will be collected, the full URL to a file must be given. Note! If collected file contains any links to other pages, these will only be followed if Index Based Collection is checked. Refer to Enable Index Based Collection in the the section below, Source.
Username	HTTP authorization username used in requests
Password	HTTP authorization password used in requests

Source

HTTP batch collection agent configuration dialog - Source tab

Item	Description
Compression	Select if the agent should try to decompress the data collected before routing it into the workflow. The options are 'No Compression' and 'Gzip'. Note! If Enable Index Based Collection is selected, only the links in the given URL will be decompressed upon collection.
Enable Index Based Collection	Select to Enable Index Based Collection. All linked-to URLs found in the HTML-formatted document will be collected. The URL is pointed out in the URL field in the section above, Connection.
URL Pattern	Either leave empty or enter a regular expression filtering the full URL. If empty all files are collected, otherwise files matching the URL Pattern will be collected. The URL itself will not be routed into the workflow.
Enable Control File Based Collection	When selected, the agent will only collect files with a control file present. The appearance of the control file is made by defining Position and the appearance of the expected control file.
Position	The control filename consists of an extension added either before or after the shared filename part. There are two choices: Prefix or Suffix refer to the example below, Control File Extensions, for more information.
Control File Extension	The Control File Extension is used to define when the data file should be collected. A data file will only be collected if the corresponding control file exists. The text entered in this field is the expected extension to the shared filename. The Control File Extension will be attached to the shared filename depending on the setting made in the Position field, refer to the example below, Control File Extensions, for more information.
Data File Extension	The Data File Extension is an optional field that is used when a stricter definition of files to be collected is needed. It is only applicable if the Position is set to Suffix. Refer to the example below for more information. Example - Control File Extenstions Consider a directory containing 5 files: FILE1.dat FILE2.dat FILE1.ok ok.FILE1 FILE1 The Position field is set to Prefix and the Control File Extension field is set to `.ok`. The control file is `ok.FILE1` and `FILE1` will be the file collected. The Position field is set to Suffix and the Control File Extension field is set to `.ok`. The control file is `FILE1.ok` and `FILE1` will be the file collected. The Position field is set to Suffix and the Control File Extension field is set to .ok and the Data File Extension field is set to `.dat`. The control file is `FILE1.ok` and `FILE1.dat` will be the file collected.
Enable HTTP DELETE	Selecting this will issue the web server to delete the file and the control file after the file has been successfully collected. If unchecked the file will be ignored after collection, that is the file will be left in on the webserver.

Advanced

HTTP batch collection agent configuration dialog - Advanced tab

Item	Description
Use Security Profile	Enable this option to allow the HTTP Batch agent to use HTTPS.
Security Profile	Browse the Security profile for the HTTP Batch agent to use.
Read Timeout (ms)	The maximum time, in milliseconds, to wait for response from the server. 0 (zero) means to wait forever

Duplicate Check

HTTP batch collection agent configuration dialog - Duplicate Check tab

The Duplicate Check feature is only used when Enable Index Based Collection found in the section above, Source, is enabled.

Item	Description
Enable Duplicate Check	When selected, the agent will store every collected URL in a (configurable) number of days. The storage will be checked to make sure that no URL is collected again as long as it remains in the storage.
Database Profile	Each collected URL will be stored in the database defined in the profile selected. The schema must contain a table called "duplicate_check", for more information about this table refer to HTTP Batch Appendix - Database Requirements for Duplicate Check(4.0).
Max Cache Age (Days)	The number of days to keep collected URLs in the database. When the workflow starts, it will delete entries that are older than this number of days. Note! If a duplicate-check workflow runs on more than one EC on separate servers, and the system clocks are not synchronized, there is a risk that UDR duplicates are prematurely deleted. For example: If two system clocks are 12 hours apart and Max Cashed Age is set to 1 day, duplicate UDRs might be deleted after only 12 hours, instead of 24.

Usage Engine Private Edition 4 Documentation

HTTP Batch Agent Configuration(4.0)

Connection

Source

Advanced

Duplicate Check