/
HTTP Batch Agent Configuration

HTTP Batch Agent Configuration

The HTTP Batch agent contains the following tabs:

  • Connection

  • Source

  • Advanced

  • Duplicate Check

Connection

HTTP batch collection agent configuration dialog - Connection tab

Field

Description

Field

Description

URL

URL to the file that will be collected, the full URL to a file must be given.

Note!

If collected file contains any links to other pages, these will only be followed if Index Based Collection is checked. Refer to Enable Index Based Collection in the the section below, Source.

Username

HTTP authorization username used in requests

Password 

HTTP authorization password used in requests

Source

HTTP batch collection agent configuration dialog - Source tab

Item

Description

Item

Description

Compression

Select if the agent should try to decompress the data collected before routing it into the workflow. The options are 'No Compression' and 'Gzip'.

Note!

If Enable Index Based Collection is selected, only the links in the given URL will be decompressed upon collection.

Enable Index Based Collection

Select to Enable Index Based Collection. All linked-to URLs found in the HTML-formatted document will be collected. The URL is pointed out in the URL field in the section above, Connection.

URL Pattern

Either leave empty or enter a regular expression filtering the full URL. If empty all files are collected, otherwise files matching the URL Pattern will be collected.

The URL itself will not be routed into the workflow.

Enable Control File Based Collection

When selected, the agent will only collect files with a control file present. The appearance of the control file is made by defining Position and the appearance of the expected control file.

Position

The control filename consists of an extension added either before or after the shared filename part. There are two choices: Prefix or Suffix refer to the example below, Control File Extensions, for more information.

Control File Extension

The Control File Extension is used to define when the data file should be collected. A data file will only be collected if the corresponding control file exists.

The text entered in this field is the expected extension to the shared filename. The Control File Extension will be attached to the shared filename depending on the setting made in the Position field, refer to the example below, Control File Extensions, for more information.

Data File Extension

The Data File Extension is an optional field that is used when a stricter definition of files to be collected is needed. It is only applicable if the Position is set to Suffix. Refer to the example below for more information.

Example - Control File Extenstions

Consider a directory containing 5 files:

  • FILE1.dat

  • FILE2.dat

  • FILE1.ok

  • ok.FILE1

  • FILE1

  1. The Position field is set to Prefix and the Control File Extension field is set to ok..
    The control file is ok.FILE1 and FILE1 will be the file collected.
     

  2. The Position field is set to Suffix and the Control File Extension field is set to .ok.

    The control file is FILE1.ok and FILE1 will be the file collected.
     

  3. The Position field is set to Suffix and the Control File Extension field is set to .ok and the Data File Extension field is set to .dat.

    The control file is FILE1.ok and FILE1.dat will be the file collected.

Enable HTTP DELETE

Selecting this will issue the web server to delete the file and the control file after the file has been successfully collected. If unchecked the file will be ignored after collection, that is the file will be left in on the webserver.

Advanced

HTTP batch collection agent configuration dialog - Advanced tab

Item

Description

Item

Description

Use Security Profile checkbox

Select to enable the use of security profile for HTTPS connection.

Security Profile

Select the security profile which the Java keystore is attached to.

Read Timeout (ms)

The timeout value, in milliseconds, to be used to wait for responses from the server. If set to 0, this means to wait forever.

Duplicate Check

HTTP batch collection agent configuration dialog - Duplicate Check tab

The Duplicate Check feature is only used when Enable Index Based Collection found in the section above, Source, is enabled.

Item

Description

Item

Description

Enable Duplicate Check

When selected, the agent will store every collected URL in a (configurable) number of days. The storage will be checked to make sure that no URL is collected again as long as it remains in the storage.

Database Profile

Each collected URL will be stored in the database defined in the profile selected. The schema must contain a table called "duplicate_check", for more information about this table refer to HTTP Batch Appendix - Database Requirements for Duplicate Check.

Max Cache Age (Days)

The number of days to keep collected URLs in the database. When the workflow starts, it will delete entries that are older than this number of days.