Performance optimisation using state/store APIs

Performance optimisation using state/store APIs

Cloud_Edition_button.webp

This section describes useful information related to objects used by Usage Engine Functions.

Store vs state

The Script Function provides different ways of state management. Two popular options are by interacting with the store and state API.

Use state during execution for high-performance in-memory storage and use store when the data should be persisted.

When dealing with persistence within a Script function, a common approach is to use store/sharedStore API. This is a fail-safe way of storing the data.

Hint!
Use the store/sharedStore API cautiously and avoid frequent use, as it can significantly impact performance.

A common practice when running a batch stream is to read from the store/sharedStore for every record, update the value and then write the same key in the store. This results in one read and write operation for every record. Processing a file with 10000 records would result in 20000 store/sharedStore operations.
Instead, a suggestion would be to use the state object to store the state in memory and only read and write to the store/sharedStore when required. The state object is much faster than store/sharedStore object.

Example - How to use state and store: recommended

 We recommend using state during transform and store in flush. 

The code snippet below shows two store operations regardless of the number of records passing through the transform.

// transform if (state.stream.sum === undefined) {   let sum = await store.get('sum');   if (sum === null) {     sum = 0;   }   state.stream.sum = sum; }   state.stream.sum += payload.value;   await push({ sum: state.stream.sum });   // flush await store.set('sum', state.stream.sum);

Example - How to use state and store: not recommended

The code snippet below shows a bad example. We do not recommend using this code: 

// transform let sum = await store.get('sum'); if (sum === null) {   sum = 0; }   sum += payload.value;   await store.set('sum', sum);   await push({ sum });

Best practices to avoid stateful stream design

Real-time streams that use the state variable when using the Script Function risk losing data stored in the memory if there is a stream execution issue. When a stream aborts or fails for any reason, using the state variable will not help since it is temporary and only available during a single execution of a stream. 

To prevent data loss during a service interruption or stream failure in real-time streams we recommend using the store or shareStore variables in the Script Function. This is particularly helpful when running multiple instances of the same stream for load balancing. 

Use case

Let's look at an example of using real-time streams for an electric vehicle charging station. When a vehicle charges at the station, usage data is generated and pushed to the Kubernetes server of Usage Engine.

The data is then sent to a real-time stream for processing. Multiple real-time streams are configured to ensure reliable handling, even during system downtime, such as a stream failure.

In the Script function, the store or sharedStore variables are used to store usage data from the charging station in a MongoDB database. This allows the data to be shared across other real-time streams or instances in Usage Engine, reducing the risk of data loss.

By contrast, using the state variable would limit data storage to a specific instance. If that instance fails, the data could be lost since the state variable is tied to the stream's execution.