Performance Optimisation using State/Store APIs

This section describes some good-to-know information related to objects used by Usage Engine Functions.

Store vs State

The Script Function provides different ways of state management. Two popular options are by interacting with the store and state API.

Use state during execution for high-performance in-memory storage and use store when the data should be persisted.

When dealing with persistence within a Script Function, a common approach is to use store/sharedStore API. This is a fail-safe way of storing the data.

Recommendation

Use the store/sharedStore with care and not too frequently as it could have a great impact on performance.

A common practice when running a batch stream is to read from the store/sharedStore for every record, update the value and then write the same key in the store. This results in one read and write operation for every record. Processing a file with 10000 records would result in 20000 store/sharedStore operations.
Instead, a suggestion would be to use the state object to store the state in memory and only read and write to the store/sharedStore when required. The state object is much faster than store/sharedStore object.

Example

The following example illustrates how to use state and store. We recommend to use state during transform and store in flush. 

Example - Shows two store operations regardless of the number of records passing through the transform.

DO (thumbs up)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

// transform
if (state.stream.sum === undefined) {
  let sum = await store.get('sum');
  if (sum === undefined) {
    sum = 0;
  }
  state.stream.sum = sum;
}
 
state.stream.sum += payload.value;
 
await push({ sum: state.stream.sum });
 
// flush
await store.set('sum', state.stream.sum);

And, we recommend not to use as shown in the following example: 

DON'T (thumbs down)

1
2
3
4
5
6
7
8
9
10
11

// transform
let sum = await store.get('sum');
if (sum === undefined) {
  sum = 0;
}
 
sum += payload.value;
 
await store.set('sum', sum);
 
await push({ sum });

Best Practices to Avoid Stateful Stream Design

Real-time streams that utilize the state variable when using the Script Function are at risk of losing data stored in the memory if there is an issue with the execution of the stream. When a stream aborts or fails due to any reason, using the state variable will not help since it is temporary and only available during a single execution of the stream. 

When working with real-time streams, if you wish to prevent loss of data during a service interruption or stream failure, it is recommended to use the store or sharedStore variables when configuring functionality in the Script Function. This is especially helpful when you are running more than one instance of the same stream for load balancing. 

Use Case

Let us look at an example involving the usage of real-time streams for an electric vehicle charging station. When the vehicle is charging at the station, usage data is generated and pushed to the Kubernetes server of Usage Engine Cloud Edition. Usage data is then sent to a real-time stream or instance for processing. For effective handling during system downtime, multiple real-time streams are available for backup in case of any failure. When configuring the functionality in the Script Function, it is recommended to use the store or shareStore variables so that the usage data from the charging station is stored in a MongoDB database and is available to be shared amongst other real-time streams/instances in the Cloud Edition solution. This helps prevent loss of usage data unlike the case when you use the state variable which is only specific to a particular instance and usage data can be deleted when the stream execution is affected.