You will find the following databases under this tuning guide:

Table of Contents

minLevel	1
maxLevel	2
outline	false
style	disc
type	list
printable	false

Tuning on PostgreSQL

This section describes how to tune the PostgreSQL database for use with Data Veracity.

...

The following command will show how a string field can be indexed.

Index for String fields in PostgreSQL

Code Block
CREATE INDEX dvStrField1 ON dvtestudr ((udr_json ->> 'stringField1'));

For fields located within multiple sublevels of UDR, the following command will create a unique index for a field that is located in within the subUDRs.

Index for nested fields

Code Block
CREATE INDEX dvStrField1 ON dvtestudr ((udr_json -> 'sub1Field' -> 'sub2Field' ->> 'stringField1'));

Following from the example as shown above, to index the UDR field intField1, the search performance could be improved by creating an index in the PostgreSQL database for this field. The following will be an SQL command to create a unique index meant for an integer field in the JSON record.

Index for Integer fields in PostgreSQL

Code Block
CREATE INDEX dvIntField1 ON dvtestudr (cast(udr_json ->> 'intField1' AS int));

When it comes to timestamps, a function is the recommended suggestion to make the data type immutable when it is casted for an index. The example will show a sample PostgreSQL function and the command for the index.

Index for timestamps

Code Block

CREATE OR REPLACE FUNCTION dvcasttime(text)
RETURNS timestamp AS
$$SELECT to_timestamp($1, 'YYYY-MM-DD HH24:MI:SS')$$ -- change the timestamp format to the format being used in the field
LANGUAGE sql IMMUTABLE;

CREATE INDEX dvdDateField1 ON dvtestudr (dvcasttime(udr_json ->> 'dateField1'));

...

Data Veracity uses Oracle 19c's JSON capabilities for storing data. This provides the capability to improve the performance of the SQL queries from Data Veracity, by applying the Oracle's database features for JSON fields indexing. The following are suggestions to how you may improve performance for Data Veracity, it is highly recommended that you consult your DBA before proceeding with any of these suggestions or if they are necessary in your solution.

...

Example of a table created in Data Veracity, viewed using sqlplus

...

The Table above shows a table, created using the SQL generation in the Data Veracity profile. The UDR_JSON field is where the record for every UDR is kept. The UDR is kept in a JSON format and Oracle 19c has a way to allow users to create an index of a field located in the JSON format itself.

...

Example UDR with 3 sub UDR levels

...

Take for example, when the user frequently searches for the UDR field intField1, the search performance could be improved by creating an index in the Oracle database for this field. The following will be an SQL command to create a unique index meant for a field in the JSON record.

Index for intField1

Code Block
CREATE UNIQUE INDEX dvIntField1_idx ON dvtestudr (json_value(udr_json, '$.intField1' RETURNING NUMBER ERROR ON ERROR));

If fields located within multiple sublevels of UDR are frequently queried by the user, an index could be created to improve the performance for searching this field. The following SQL command will create a unique index for a field that is located in within the subUDRs.

Index for sub1Field.sub2Field.intField

Code Block
CREATE UNIQUE INDEX dvIntSubField2_idx ON dvtestudr (json_value(udr_json, '$.sub1Field.sub2Field.intField' RETURNING NUMBER ERROR ON ERROR));

...

This section describes how to tune the SAP HANA database for use with Data Veracity.

General Usage

For SAP HANA, memory sizing is especially important due to the data being stored in-memory to avoid the performance penalty of disk I/O. We recommend running the HANA_Configuration_Minichecks script as described in the SAP Knowledge Base Article 3019194 if there is any performance issue and consult with the DBA to implement the suggestions.

Additionally, as the number of records in the table and the size of the UDRs varies, it is highly recommended to add query filters using the common fields (e.g. ID, ERROR_CODE, INSERT_TIME, STATE, UPDATE_TIME, WORKFLOW_NAME) to narrow down the search scope for better performance.

Memory Allocation

In the event that the number of records are large enough to cause SAP HANA the database to run out of memory, you may refer to the SAP Knowledge Base Article 3202692 to adjust the “statement_memory_limit” accordingly, so that the maximum memory allocation per statement can be increased to an amount which is suitable to the workload. Please consult with your DBA before making the changes.

If memory limit errors still occur, consider scaling up the memory of the server hosting the SAP HANA database to allow for a higher limit adjustments.

Columnar vs Row-based Tables

The SQL generated by the Data Veracity Profile for SAP HANA creates columnar tables, as the performance is generally better compared to row-based. The rule of thumb is that if the majority of table access is through a large number of tables with only a few selected attributes, then columnar storage is preferred. Row-based storage is preferable if most table access involves selecting a few records with all attributes selected.

Batch Size

In cases where there is a significant network latency between the server and the SAP HANA database, it is recommended to configure the property “mz.dv.jdbcBatchSize” to limit the fetch size while repairing the records using UI.

...

Versions Compared

Old Version 6

New Version Current

Key

Tuning on PostgreSQL

Index for String fields in PostgreSQL

Index for nested fields

Index for Integer fields in PostgreSQL

Index for timestamps

Index for intField1

Index for sub1Field.sub2Field.intField

General Usage

Memory Allocation

Columnar vs Row-based Tables

Batch Size

Page Comparison

Versions Compared

Old Version 6

New Version Current

Key

Tuning on PostgreSQL

Index for String fields in PostgreSQL

Index for nested fields

Index for Integer fields in PostgreSQL

Index for timestamps

Index for intField1

Index for sub1Field.sub2Field.intField

General Usage

Memory Allocation

Columnar vs Row-based Tables

Batch Size