/
Performance Tuning with Couchbase Storage

Performance Tuning with Couchbase Storage

This section describes how to tune the Aggregation agent with Couchbase storage.

Unless stated otherwise, the properties referred to in this section are set in the Advanced tab in the Aggregation profile.

For further information on performance tuning with Couchbase storage, you can refer to Couchbase's own advice on Tuning & Performance in the Couchbase Documentation.

Queries and Indexes

When Couchbase is selected as the storage type in an Aggregation profile, a bucket is automatically created during execution of a workflow. The bucket is named according to the configuration of the assigned Couchbase profile. The bucket is populated with documents that contain the aggregation session data. This makes it possible to index the timeout information of aggregation sessions in Couchbase.

The aggregation session data is fetched using a N1QL query. 

Note!

N1QL queries are used by default. If you want to use views instead, use the following mzsh topo command to set the property mz.cb.use.n1ql to false in ECs:

mzsh topo set topo://container:<container>/pico:<pico>/val:config.properties.mz.cb.use.n1ql false

The data returned by the query is split into chunks of a configurable size. The size of each partial set of data can be configured by setting the property view.iteratorpageSize in the  Advanced  tab of the assigned Couchbase profile. Setting a higher value than the default 1000, may increase throughput performance but it depends on the available RAM of the Execution Context host.

You can choose to update the result set from a query before or after it has been called. Or you can choose to retrieve the existing result set from a query. In this case the results are possibly out of date, or stale. To control this behavior, you can set the property view.index.stale in the  Advanced tab of the assigned Couchbase profile. The following settings are available:
 

  • FALSE - The index is updated before the query is executed. This ensures that any documents updated (and persisted to disk) are included in the query. The client waits until the index has been updated before the query is executed, and therefore the response is delayed until the updated index is available.

  • OK - The index is not updated. If an index exists for the given query, the information in the current index is used as the basis for the query and the results are returned accordingly. This value is seldom used and only if automatic index updates are enabled in Couchbase.

  • UPDATE_AFTER - This is the recommended setting when using a Couchbase profile with Aggregation. The existing index is used as the basis of the query, but the index is marked for updating once the results have been returned to the client.

For more information about queries and indexes, you can refer to Couchbase's own documentation on indexes.

Timeout

There are by default, two timeout threads per workflow that periodically check the Couchbase aggregation storage for timed out sessions. You can control how often this check is performed by setting  mz.cb.agg.timeoutwait.sec . The default value is 10 seconds.

You can also increase the number of threads that perform this check by setting the property mz.cb.agg.timeout_no_of_thread. Setting a higher value than default may speed up detection of timeouts. However, the number of CPUs and the time that it takes for Couchbase to index accessed documents (session data) are limiting factors.

Hint!

You can use the MIM parameter Session Timeout Latency as an indicator of the timeout handling performance.

The sessions that are fetched from the Couchbase query are shuffled randomly in temporary buffers, one for each workflow. This is done to minimize the probability that multiple workflows attempt to time out the same sessions simultaneously. You can control the size of these buffers by setting the property mz.cb.agg.randombuffer. The default value is 1000 sessions.

You use the Operation Timeout (ms) setting in the Connectivity tab of the assigned Couchbase profile to control the timeout of Couchbase "CRUD" operations, i e create, read, update, and delete. Setting a lower value than the default 1000 ms may have a positive impact on throughput performance. However, if the value is set too low, indicated by a large number of operation timeouts  errors in the EC logs, a lower throughput can be expected.

Queries operate over a different protocol than CRUD operations and have a separate timeout property named view.timeout in the Advanced tab of the Couchbase profile. The default value is set to 75000 (ms). It is generally not recommended to decrease this value. However, if you frequently receive the error Failed to iterate through timeout sessions in the EC logs, increasing this value may have a positive impact on throughput performance.

Session Storage Format

The aggregation sessions are stored in JSON format. However, some of the data within the JSON strings can be stored in binary format instead of plain text (default). You can change the stored format by setting the property mz.cb.agg.json_serializer.format. The valid values are:

  • MZ-BIN - The session data is serialized into JSON strings with binary content.
  • JSON - The session data is serialized into JSON strings with plain text content.

Example - Binary Format

{ 
  "drType": "MZ-BIN", 
  "drFormatVersion": 2, 
  "data": "Af+cAAAAR2NvbS5tZWRpYXRpb256b25lLnVsdHJhLmluLmFnZ3Jl 
  Z2F0aW9uX2NvbW1vbl9zZXNz\naW9uX3Nlc3Npb25fMTk1MzI3MTEwAAABT 
  C19yJgAAAAAAAAAAAEAAAAGAQAAAAExAQAAAAExAQAA\nAAExAAAABgEAAA 
  AHY29uc3VtZQ==", 
  "SessionTimeout": 1426692360344, 
  "initialized": true 
 }

Example - Plain text format

{ 
  "drType": "JSON", 
  "drFormatVersion": 2, 
  "data": { 
  "Type": "udr", 
  "StorableId": "aggregation_common.session.session", 
  "TypeName": "aggregation_common.session.session", 
  "Version": 1, 
  "Content": { 
  "CRCValue": 0, 
  "v_bnum": "1", 
  "SessionID": null, 
  "initialized": true, 
  "SessionTimeout": 1426691920461, 
  "v_response": "consume", 
  "v_anum": "1", 
  "v_total_duration": 5 
  } 
  }, 
  "SessionTimeout": 1426691920461, 
  "initialized": true 
 }

The Aggregation agent can read stored session data in both formats, regardless of the selected value.

In order to obtain the best possible performance in the Aggregation agent, you should use the binary format.

Hint

You can use an Aggregation agent with Force Read Only selected to read the stored sessions. You can then encode the binary content to JSON with the APL function jsonEncodeUdr. This is useful for debugging, or when you want to view the sessions in an external system. For more information about jsonEncodeUdr, see the /wiki/spaces/MZD70/pages/5145301.

Replication and Persistence

You can use the properties mz.cb.awaitPersistenceTo and mz.cb.awaitReplicationTo in the Advanced tab of the selected Couchbase profile to minimize the risk of data loss. However, setting a higher value than the default 0 will reduce the throughput performance.

Automated Index Updates

Note!

This section only applies if you are using view instead of queries.

In order to obtain the best possible performance in the Aggregation agent, you should disable automatic index updates in Couchbase.

From a terminal window, update the index settings using the curl  tool. 

curl -u <Couchbase administrator user>:<password>:<IP address or hostname>:8091/settings/viewUpdateDaemon -d updateMinChanges=0

You may specify the IP address or hostname of any available node in the Couchbase cluster. If the updates are successful, the changes will be applied to all nodes.

For more information about automated index updates, you can refer to Couchbase's own documentation on views.