Definition
The service defines a transaction as a unit of data (For e.g, a file) being processed by a stream. A transaction is said to be complete if the file is processed by the stream without errors. Individual transactions are limited to the specified data source. Transactions are separated into two types of categories:
Files-based functions — Collectors such as Amazon S3 and SFTP define the input as files, so a transaction is defined as file processing without any errors.
- Data-based functions — The Counter and Data Generator collectors are used to generate input data on their own. In this case, transactions are regarded as processes themselves when the data generation has started.
The data correction feature creates transactions by itself, and they are regarded as separate processes.
Transaction Safety Overview
The behavior of transaction safety depends on the type of functions used in the stream and is only supported in the functions that have a state.
If a stream fails during a transaction, the stream is aborted and all data is secured.
If the stream is restarted, a rollback is triggered to clean up incomplete transactions. The execution restarts from the last successfully processed transaction in the stream. For example, consider a stream that is processing 10 files. If the first 3 files are successfully processed and then an error occurs while processing the 4th file, the stream is aborted and Transaction Safety ensures that when the progress is saved and the stream is resumed, the processing continues from the last successful processing, i.e. from the 4th file.
Transaction safety comes in three types: At-most-once, At-least-once, and Exactly-once.
typically uses transaction safety of the type Exactly-once, to ensures that the data or file is processed only once during the execution of the stream. However, some functions are designed to include duplicates, and as such acts like transactions safety of the type At-least-once.
At-most-once
The result may or may not reach its destination (data loss is possible).
Note!
This type is not used in .
At-least-once
The result is generated but duplicate results are possible due to multiple deliveries. The following functions use this method:
Exactly-once
The result is generated only once. No duplicates can be made.
Currently, transaction safety of the type Exactly-once is supported by the following functions:
- Amazon S3 collector
- SFTP collector
- Count
- Amazon S3 forwarder
- SFTP forwarder
- Interconnect collector
- Interconnect forwarder
- Data Aggregator
- Deduplicate
- Data Correction ( routed from Validate function only)
Transactions using Multiple Collectors
Streams using multiple collectors are handled in a way that ensures transaction safety. Each collector is handled in turn, determined by . Each collector adds to a queue, once the collector is ready to read the input.
The following example has three collectors, and also a corrected transaction from a validation:
- The starting point, six transactions (A,Y,X,J,I,K) in total waiting to complete.
- The Data Correction has the highest priority, and is handled before everything else.
- The Counter got ready with its transaction before the other ones and finished second.
- The remaining collectors handled their transactions in order, which happened to result in the following output sequence.