Tuning on PostgreSQL

This section describes how to tune the PostgreSQL database for use with Data Veracity.

Much like Oracle, Data Veracity allows the user to improve the performance of PostgreSQL queries. The following will be a suggestion on how to improve the performance. It is always important to consult your DBA before proceeding with any of these suggestions or if they are necessary in your solution.

PostgreSQL has certain specific syntaxes that are used for certain data types, the following examples will show how to index in PostgreSQL for string data types, integer data types and timestamps.

The following command will show how a string field can be indexed.

Index for String fields in PostgreSQL

CREATE INDEX dvStrField1 ON dvtestudr ((udr_json ->> 'stringField1'));

For fields located within multiple sublevels of UDR, the following command will create a unique index for a field that is located in within the subUDRs.

Index for nested fields

CREATE INDEX dvStrField1 ON dvtestudr ((udr_json -> 'sub1Field' -> 'sub2Field' ->> 'stringField1'));

Following from the example as shown above, to index the UDR field intField1, the search performance could be improved by creating an index in the PostgreSQL database for this field. The following will be an SQL command to create a unique index meant for an integer field in the JSON record.

Index for Integer fields in PostgreSQL

CREATE INDEX dvIntField1 ON dvtestudr (cast(udr_json ->> 'intField1' AS int));

When it comes to timestamps, a function is the recommended suggestion to make the data type immutable when it is casted for an index. The example will show a sample PostgreSQL function and the command for the index.

Index for timestamps

CREATE OR REPLACE FUNCTION dvcasttime(text)
RETURNS timestamp AS
$$SELECT to_timestamp($1, 'YYYY-MM-DD HH24:MI:SS')$$ -- change the timestamp format to the format being used in the field
LANGUAGE sql IMMUTABLE;

CREATE INDEX dvdDateField1 ON dvtestudr (dvcasttime(udr_json ->> 'dateField1'));

Tuning on Oracle

This section describes how to tune the Oracle Database for use with Data Veracity.

Oracle Installation

Be sure to consult your DBA to ensure that all system configuration tasks for the Oracle installation are completed before getting started on Data Veracity.

Data Veracity uses Oracle 19c's JSON capabilities for storing data. This provides the capability to improve the performance of the SQL queries from Data Veracity, by applying the Oracle's database features for JSON fields indexing. The following are suggestions to how you may improve performance for Data Veracity, it is highly recommended that you consult your DBA before proceeding with any of these suggestions or if they are necessary in your solution.

Example of a table created in Data Veracity, viewed using sqlplus

The Table above shows a table, created using the SQL generation in the Data Veracity profile. The UDR_JSON field is where the record for every UDR is kept. The UDR is kept in a JSON format and Oracle 19c has a way to allow users to create an index of a field located in the JSON format itself.

Example UDR with 3 sub UDR levels

For example, when the user frequently searches for the UDR field intField1, the search performance could be improved by creating an index in the Oracle database for this field. The following will be an SQL command to create a unique index meant for a field in the JSON record.

Index for intField1

CREATE UNIQUE INDEX dvIntField1_idx
ON dvtestudr (json_value(udr_json, 
'$.intField1' 
RETURNING NUMBER ERROR ON ERROR));

If fields located within multiple sublevels of UDR are frequently queried by the user, an index could be created to improve the performance for searching this field. The following SQL command will create a unique index for a field that is located in within the subUDRs.

Index for sub1Field.sub2Field.intField

CREATE UNIQUE INDEX dvIntSubField2_idx
ON dvtestudr (json_value(udr_json, 
'$.sub1Field.sub2Field.intField' 
RETURNING NUMBER ERROR ON ERROR));

Tuning on SAP HANA

This section describes how to tune the SAP HANA database for use with Data Veracity.

General Usage

For SAP HANA, memory sizing is especially important due to the data being stored in-memory to avoid the performance penalty of disk I/O. We recommend running the HANA_Configuration_Minichecks script as described in the SAP Knowledge Base Article 3019194 if there is any performance issue and consult with the DBA to implement the suggestions.

Additionally, as the number of records in the table and the size of the UDRs varies, it is highly recommended to add query filters using the common fields (e.g. ID, ERROR_CODE, INSERT_TIME, STATE, UPDATE_TIME, WORKFLOW_NAME) to narrow down the search scope for better performance.

Memory Allocation

In the event that the number of records are large enough to cause the database to run out of memory, you may refer to the SAP Knowledge Base Article 3202692 to adjust the statement_memory_limit accordingly, so that the maximum memory allocation per statement can be increased to an amount which is suitable to the workload. Please consult with your DBA before making the changes.

If memory limit errors still occur, consider scaling up the memory of the server hosting the SAP HANA database to allow for a higher limit adjustments.

Columnar vs Row-based Tables

The SQL generated by the Data Veracity Profile for SAP HANA creates columnar tables, as the performance is generally better compared to row-based. The rule of thumb is that if the majority of table access is through a large number of tables with only a few selected attributes, then columnar storage is preferred. Row-based storage is preferable if most table access involves selecting a few records with all attributes selected.

Batch Size

In cases where there is a significant network latency between the server and the SAP HANA database, it is recommended to configure the property mz.dv.jdbcBatchSize to limit the fetch size while repairing the records using UI.

The default value is 100, meaning 100 rows will be fetched from the database simultaneously, which might not be optimal for large result sets or when significant network latency is observed.

As a larger value will result in a higher memory consumption, we recommend increasing it gradually while monitoring the platform memory usage, to prevent the Platform from running out of memory.

The property can be configured using the topo command, for example:

Configuring mz.dv.jdbcBatchSize

mzsh topo set topo://container:<platform-container-name>/pico:platform/val:config.properties.mz.dv.jdbcBatchSize 5000

Data Veracity Performance Tuning(4.2)