Performance optimisation using state/store APIs
This section describes useful information related to objects used by Usage Engine Functions.
Store vs state
The Script Function provides different ways of state management. Two popular options are by interacting with the store and state API.
Use state
during execution for high-performance in-memory storage and use store
when the data should be persisted.
When dealing with persistence within a Script function, a common approach is to use store
/sharedStore
API. This is a fail-safe way of storing the data.
Hint!
Use the store/sharedStore
API cautiously and avoid frequent use, as it can significantly impact performance.
A common practice when running a batch stream is to read from the store/sharedStore
for every record, update the value and then write the same key in the store. This results in one read and write operation for every record. Processing a file with 10000 records would result in 20000 store/sharedStore
operations.
Instead, a suggestion would be to use the state
object to store the state in memory and only read and write to the store/sharedStore
when required. The state
object is much faster than store/sharedStore
object.
Example - How to use state
and store
: recommended
We recommend using state
during transform and store
in flush.
The code snippet below shows two store operations regardless of the number of records passing through the transform.
// transform
if (state.stream.sum === undefined) {
let sum = await store.get('sum');
if (sum === null) {
sum = 0;
}
state.stream.sum = sum;
}
state.stream.sum += payload.value;
await push({ sum: state.stream.sum });
// flush
await store.set('sum', state.stream.sum);
Example - How to use state
and store
: not recommended
The code snippet below shows a bad example. We do not recommend using this code:
// transform
let sum = await store.get('sum');
if (sum === null) {
sum = 0;
}
sum += payload.value;
await store.set('sum', sum);
await push({ sum });
Best practices to avoid stateful stream design
Real-time streams that use the state variable when using the Script Function risk losing data stored in the memory if there is a stream execution issue. When a stream aborts or fails for any reason, using the state variable will not help since it is temporary and only available during a single execution of a stream.
To prevent data loss during a service interruption or stream failure in real-time streams we recommend using the store
or shareStore
variables in the Script Function. This is particularly helpful when running multiple instances of the same stream for load balancing.
Use case
Let's look at an example of using real-time streams for an electric vehicle charging station. When a vehicle charges at the station, usage data is generated and pushed to the Kubernetes server of Usage Engine.
The data is then sent to a real-time stream for processing. Multiple real-time streams are configured to ensure reliable handling, even during system downtime, such as a stream failure.
In the Script function, the store
or sharedStore
variables are used to store usage data from the charging station in a MongoDB database. This allows the data to be shared across other real-time streams or instances in Usage Engine, reducing the risk of data loss.
By contrast, using the state
variable would limit data storage to a specific instance. If that instance fails, the data could be lost since the state variable is tied to the stream's execution.