HTTP Batch Agent Configuration

The HTTP Batch agent contains the following tabs:

  • Connection
  • Source
  • Advanced
  • Duplicate Check

Connection

HTTP batch collection agent configuration dialog - Connection tab

FieldDescription
URL

URL to the file that will be collected, the full URL to a file must be given.

Note!

If collected file contains any links to other pages, these will only be followed if Index Based Collection is checked. Refer to Enable Index Based Collection in the the section below, Source.

UsernameHTTP authorization username used in requests
Password HTTP authorization password used in requests

Source

HTTP batch collection agent configuration dialog - Source tab

ItemDescription
Compression

Select if the agent should try to decompress the data collected before routing it into the workflow. The options are 'No Compression' and 'Gzip'.

Note!

If Enable Index Based Collection is selected, only the links in the given URL will be decompressed upon collection.

Enable Index Based Collection

Select to Enable Index Based Collection. All linked-to URLs found in the HTML-formatted document will be collected. The URL is pointed out in the URL field in the section above, Connection.

URL Pattern

Either leave empty or enter a regular expression filtering the full URL. If empty all files are collected, otherwise files matching the URL Pattern will be collected.

The URL itself will not be routed into the workflow.

Enable Control File Based Collection

When selected, the agent will only collect files with a control file present. The appearance of the control file is made by defining Position and the appearance of the expected control file.

Position

The control filename consists of an extension added either before or after the shared filename part. There are two choices: Prefix or Suffix refer to the example below, Control File Extensions, for more information.

Control File Extension

The Control File Extension is used to define when the data file should be collected. A data file will only be collected if the corresponding control file exists.

The text entered in this field is the expected extension to the shared filename. The Control File Extension will be attached to the shared filename depending on the setting made in the Position field, refer to the example below, Control File Extensions, for more information.

Data File Extension

The Data File Extension is an optional field that is used when a stricter definition of files to be collected is needed. It is only applicable if the Position is set to Suffix. Refer to the example below for more information.

Example - Control File Extenstions

Consider a directory containing 5 files:

  • FILE1.dat

  • FILE2.dat

  • FILE1.ok

  • ok.FILE1

  • FILE1

  1. The Position field is set to Prefix and the Control File Extension field is set to ok..
    The control file is ok.FILE1 and FILE1 will be the file collected.
     
  2. The Position field is set to Suffix and the Control File Extension field is set to .ok.

    The control file is FILE1.ok and FILE1 will be the file collected.
     

  3. The Position field is set to Suffix and the Control File Extension field is set to .ok and the Data File Extension field is set to .dat.

    The control file is FILE1.ok and FILE1.dat will be the file collected.


Enable HTTP DELETE

Selecting this will issue the web server to delete the file and the control file after the file has been successfully collected. If unchecked the file will be ignored after collection, that is the file will be left in on the webserver.

Advanced

HTTP batch collection agent configuration dialog - Advanced tab

ItemDescription
Use Security Profile checkbox

Select to enable the use of security profile for HTTPS connection.

Security Profile

Select the security profile which the Java keystore is attached to.

Read Timeout (ms)

The timeout value, in milliseconds, to be used to wait for responses from the server. If set to 0, this means to wait forever.

Duplicate Check

HTTP batch collection agent configuration dialog - Duplicate Check tab

The Duplicate Check feature is only used when Enable Index Based Collection found in the section above, Source, is enabled.

ItemDescription

Enable Duplicate Check

When selected, the agent will store every collected URL in a (configurable) number of days. The storage will be checked to make sure that no URL is collected again as long as it remains in the storage.

Database Profile

Each collected URL will be stored in the database defined in the profile selected. The schema must contain a table called "duplicate_check", for more information about this table refer to HTTP Batch Appendix - Database Requirements for Duplicate Check.

Max Cache Age (Days)

The number of days to keep collected URLs in the database. When the workflow starts, it will delete entries that are older than this number of days.

Note!

If a duplicate-check workflow runs on more than one EC on separate servers, and the system clocks are not synchronized, there is a risk that UDR duplicates are prematurely deleted. For example: If two system clocks are 12 hours apart and Max Cashed Age is set to 1 day, duplicate UDRs might be deleted after only 12 hours, instead of 24.