Performance and scalability
Usage Engine is designed to handle very large data volumes with strong processing guarantees. As a user designing a stream, there are a few concepts you need to be aware of that relate to performance and scalability.
Note!
By default, when working with files, Usage Engine will either successfully process an entire input file, or not process the input file at all. This is called transactional processing. This transactional processing helps ensure that data is neither lost nor duplicated.
If you are processing many small files, the performance overhead of having one transaction per file can reduce performance.
It is possible to batch multiple files into a single transaction when collecting files with the Amazon S3 collector.
In the configuration:
Set the Transaction Batch Size to a value greater than 1 to process multiple files per transaction.
Note!
A general rule of thumb is to set the Transaction Batch Size value somewhere between 100 to 1000 files if your files are many and small. Look at your Stream metrics (LINK) to see the time it takes to process each file and each transaction. You may need to tune this value to achieve optimal performance for your stream. You can change the transaction batch size between stream executions.
When a stream is executed, by default Usage Engine Cloud Edition processes them all in sequence in a single runtime instance of the stream.