...
The following command will show how a string field can be indexed.
Index for String fields in PostgreSQL
Code Block |
---|
CREATE INDEX dvStrField1 ON dvtestudr ((udr_json ->> 'stringField1')); |
For fields located within multiple sublevels of UDR, the following command will create a unique index for a field that is located in within the subUDRs.
Index for nested fields
Code Block |
---|
CREATE INDEX dvStrField1 ON dvtestudr ((udr_json -> 'sub1Field' -> 'sub2Field' ->> 'stringField1')); |
Following from the example as shown above, to index the UDR field intField1, the search performance could be improved by creating an index in the PostgreSQL database for this field. The following will be an SQL command to create a unique index meant for an integer field in the JSON record.
Index for Integer fields in PostgreSQL
Code Block |
---|
CREATE INDEX dvIntField1 ON dvtestudr (cast(udr_json ->> 'intField1' AS int)); |
When it comes to timestamps, a function is the recommended suggestion to make the data type immutable when it is casted for an index. The example will show a sample PostgreSQL function and the command for the index.
Index for timestamps
Code Block |
---|
CREATE OR REPLACE FUNCTION dvcasttime(text) RETURNS timestamp AS $$SELECT to_timestamp($1, 'YYYY-MM-DD HH24:MI:SS')$$ -- change the timestamp format to the format being used in the field LANGUAGE sql IMMUTABLE; CREATE INDEX dvdDateField1 ON dvtestudr (dvcasttime(udr_json ->> 'dateField1')); |
...
The Table above shows a table, created using the SQL generation in the Data Veracity profile. The UDR_JSON field is where the record for every UDR is kept. The UDR is kept in a JSON format and Oracle 19c has a way to allow users to create an index of a field located in the JSON format itself.
...
Example UDR with 3 sub UDR levels
...
Take for example, when the user frequently searches for the UDR field intField1, the search performance could be improved by creating an index in the Oracle database for this field. The following will be an SQL command to create a unique index meant for a field in the JSON record.
Index for intField1
Code Block |
---|
CREATE UNIQUE INDEX dvIntField1_idx ON dvtestudr (json_value(udr_json, '$.intField1' RETURNING NUMBER ERROR ON ERROR)); |
If fields located within multiple sublevels of UDR are frequently queried by the user, an index could be created to improve the performance for searching this field. The following SQL command will create a unique index for a field that is located in within the subUDRs.
Index for sub1Field.sub2Field.intField
Code Block |
---|
CREATE UNIQUE INDEX dvIntSubField2_idx ON dvtestudr (json_value(udr_json, '$.sub1Field.sub2Field.intField' RETURNING NUMBER ERROR ON ERROR)); |
...
This section describes how to tune the SAP HANA database for use with Data Veracity.
General Usage
For SAP HANA, memory sizing is especially important due to the data being stored in-memory to avoid the performance penalty of disk I/O. We recommend running the HANA_Configuration_Minichecks script as described in the SAP Knowledge Base Article 3019194 if there is any performance issue and consult with the DBA to implement the suggestions.
Additionally, as the number of records in the table and the size of the UDRs varies, it is highly recommended to add query filters using the common fields (e.g. ID, ERROR_CODE, INSERT_TIME, STATE, UPDATE_TIME, WORKFLOW_NAME) to narrow down the search scope for better performance.
Memory Allocation
In the event that the number of records are large enough to cause SAP HANA the database to run out of memory, you may refer to the SAP Knowledge Base Article 3202692 to adjust the “statement_memory_limit
” accordingly, so that the maximum memory allocation per statement can be increased to an amount which is suitable to the workload. Please consult with your DBA before making the changes.
If memory limit errors still occur, consider scaling up the memory of the server hosting the SAP HANA database to allow for a higher limit adjustments.
Columnar vs Row-based Tables
The SQL generated by the Data Veracity Profile for SAP HANA creates columnar tables, as the performance is generally better compared to row-based. The rule of thumb is that if the majority of table access is through a large number of tables with only a few selected attributes, then columnar storage is preferred. Row-based storage is preferable if most table access involves selecting a few records with all attributes selected.
Batch Size
In cases where there is a significant network latency between the server and the SAP HANA database, it is recommended to configure the property “mz.dv.jdbcBatchSize
” to limit the fetch size while repairing the records using UI.
...