Validation JSON file types

Introduction

This Function is designed to validate data by processing the received data according to the designated rules and conventions in Usage Engine Cloud Edition. Regarding the JSON file types and objects, they are to be checked according to the official JSON Schema Validation definitions, approved by the IETF Trust. Each JSON schema object is to be independently examined by the built-in engine for validity.

Interoperability Considerations

For the purpose of maintaining interoperability and according to the official JSON Schema regulations, several considerations are accounted for during validation. All of the listed conditions are implemented by following the official regulations. No exceptions or special cases are made unless they are explicitly specified in the official Usage Engine Cloud Edition documentation.

A list of the prescribed considerations is the following:

Validation of String Instances – The null character (\u0000) is considered a valid JSON string.
Validation of Numeric Instances – The JSON schema will not add any bounds to numbers with arbitrary precision. Such instances can be arbitrarily large and/or can contain long decimal parts.
Regular Expressions – Use of keywords that use regular expressions or constrain the value to such an expression should confront the core JSON Schema specifications.
Meta-Schemes – The latest JSON Schema dialect meta-schema release specifications are supported.

It is important to note that at all times JSON schema data should be entered in accordance with the official specifications.=

Validation keywords for all instance types

Validation for keywords for all instance types are governed by the following rules for each JSON file type object:

type – The entered value must be string or an array. If an array type is entered, its elements must be strings and should be unique. Values of the string type must be one of the listed primitive types: “null”, ‘boolean”, “array”, “number”, or “string”. Values that are integers are validated when they match any number with a zero fractional part.
enum – The entered value of this keyword must be an array with at least one element. All listed elements should be unique and can be of any type (including “null”). Successful validation is done when the entered value is matched to one of the array values belonging to the enum keyword.
const – The entered value of this keyword must be in an array with at least one element. If more than one element is present, they should all be uniquely named. All array element types can be used, including “null”. Successful validation is done when the entered value is matched to one of the values belonging to the const keyword.

Specific const rules are defined for the following JSON types:

tuple – Tuple refers to a set of objects which are ordered together and immutable. They cannot be changed and operate in the same manner as “string” data types. Validation is performed for the whole array of items placed inside the tuple instance.
oneOf – oneOf validation is used to compare data between items exactly as shown in the current schema configuration.
nestedObject – Nested objects are complex structures that include a “parent” JSON object that has “child” properties attached to it. Validation is performed on the whole iteration.
reference – A reference keyword is a valid instance that points to a given location, that is indicated by the referenced value.

Validation keywords for Numeric Instance Types

This section lists the validation framework for the JSON numeric instance types:

multipleOf – The entered value must be a number greater than 0. Validation is performed when the division of the entered value results in an integer.
maximum – The entered value must be a number, representing an upper limit for the given numeric instance. Validation is performed when the instance is less or equal to the “maximum” value.
exclusiveMaximum -- The entered value must be a number, representing an upper limit for the given numeric instance. Validation is performed when the instance is less than the “exclusive maximum” value. It will not validate if it is equal to the value.
minimum – The entered value must be a number, representing an inclusive lower limit for the given numeric instance. Validation is performed if the instance is greater or equal to the “minimum” value.
exclusiveMinimum – The entered value must be a number, representing an exclusive lower limited for the given numeric instance. Validation is performed when the instance is greater than the “exclusive Minimum” value.

Validation keywords for String Instance Types

This section lists the validation framework for the JSON string instance types:

maxLength – The entered value must be a non-negative integer. Validation is performed when the string instance is equal to a length that is less than, or equal to the designated keyword value.
minLength – The entered value must be a non-negative integer. Validation is performed when the string instance is greater than, or equal to the designated keyword value. When this keyword is omitted, the Usage Engine Cloud Edition service will treat it as being a value with “0” length.
pattern – The entered value must be a valid string expression. Validation is performed if the expression matches the defined instance.

Validation keywords for Array Instance Types

This section lists the validation framework for the JSON array instance types:

maxItems – The entered value must be a non-negative integer. Validation is performed when the array size is less than, or equal to the value of the input keyword.
minItems – The entered value must be a non-negative integer. Validation is performed when the array size is greater than, or equal to the value of the input keyword. When this keyword is omitted, the Usage Engine Cloud Edition service will treat it as being a value with “0” length.
uniqueItems – The entered value must be a Boolean. Validation will be done if the value is “false”. “True” validation is done only when all of the array instances are unique. When this keyword is omitted, the Usage Engine Cloud Edition service will treat it as having a value of “false”.
maxCointains – The entered value must be a non-integer. The keyword will have no effect if the “contains” values are not present within the same schema object. Validation is performed, depending on the annotation results of the adjacent “contains” keyword in two ways. Validation will pass if the annotation results is an array with a length that is less or equal to the “maxContains” value. The other validation option is by having the annotation result as a “true” boolean value with an array instance with a length that is less or equal to the “maxContains” value.
minContains – The entered value must be a non-negative integer. The keyword will have no effect if the “contains” values are not present within the same schema object. Validation is performed, depending on the annotation results of the adjacent “contains” keyword in two ways. Validation will pass if the annotation result is an array with a length that is greater or equal to the “minContains” value. The other validation option is by having the annotation result as a “true” Boolean value with an array instance that is greater or equal to the “minContains” value. When this keyword is omitted, the Usage Engine Cloud Edition service will treat is having the same behavior as a value of “1”. Validation allows a value of “0”, but that is only useful for setting up a range a of occurrences from “0” going up to the value of “maxContains”.

Validation keywords for Object Instance Types

This section lists the validation framework for the JSON object instance types:

maxProperties – The entered value must be a non-negative integer. Validation is performed for this object instance when the number of the properties Is less than, or equal to the value of this keyword.
minProperties – The entered value must be a non-negative integer. Validation is performed for this object instance when the number of the properties is greater than, or equal to the value of the keyword.
required – The entered value must be an array. Its elements (if any) must be of string data types and uniquely named. Validation is performed if every item in the array is the name of the property in the instance. When this keyword is omitted, the keyword will be regarded as having the same behaviour as an empty array.
dependentRequired – The entered value must be an object. If there are any properties, they must be of the array data type. Their elements (if any) must be of string data types and uniquely named. This particular keyword will have specific properties, depending on the available properties. Requirements for them are dependent on the presence of other properties within the schema. Validation is performed for each name that appears in both the instance and the keyword value name corresponding to the array item. These items must have the names of the properties in the given instance. When this keyword is omitted, the keyword will be regarded as having the same behaviour as an empty object.

Validation keywords for Semantic content with “format”

Utilization of certain values cannot be done by solely relying on structural validation. For this reason, the “format” annotation keyword has been defined for JSON schema. It is intended to convey semantic information for a fixed subset of values. The values that correspond to the “format” semantic content are called format attributes.

These format attributes must be of the string data type. They generally validate only a given set of instance types. If the validation type is not part of this set, then the overall validation for the format attribution should succeed during processing. The format attributes can be specified to any instance type as defined by the current JSON schema data model.

As such there are two specific vocabularies that have been developed and implemented in the service:

Format-Annotation Vocabulary – The value of this format must be collected as an annotation if the used implementation supports such a collection. This is done to ensure the validation is done on an application level if the schema validation for individual objects is not available. Implementations can still be treated as a “format” in certain situations, as assertions that can attempt to validate the values. In such cases, these implementations are to document their level of support for the validation.
Format-Assertion Vocabulary – The vocabulary will be declared with “true” value when implementations provide full validation support for all formats. If these implementations cannot validate the full schema, processing will not occur.

Custom format attributes may be supported in the listed implementations. It is recommended to define the additional keywords in the custom vocabulary.

Validation for Defined Formats

Attributes apply to the string instances that correspond to defined formats. They are standardized according to international agreements and technical specifications. When implemented they should uphold the following rules:

date-time – This string instance is valid if it is a correct representation according to the “date-time” production specification.
date – This string instance is valid if it is a correct representation according to the “full-date” production specification.
time – This string instance is valid if it is a correct representation according to the “full-time” production specification.
duration – This string instance is valid if it is a correct representation according to the “duration” production specification.
email – This string instance is valid if it is a correct representation of an Internet email address. Strings valid against the “email” attribute are also valid against “idn-email” as well.
idn-email – This string instance is valid if it is a correct representation of an “extended mailbox” attribute.
hostname – This string instance is valid if it is a correct representation of an Internet hostname. Strings that are valid against “hostname” are also valid against the “idn-hostname” attribute.
idn-hostname – This string instance is valid if it is a correct representation of an internationalized hostname.
ipv4 – This string instance is valid if it is a correct representation of an IP address as defined by the “dotted-quad” ABNF syntax.
ipv6 – This string instance is valid if it is a correct representation of an IP address confronting to Ipv6 specifications.
uri – This string instance is valid if it is a correct URI (Uniform Resource Identifier) according to the relevant RFC specifications. To indicate UUIDs as URNs, use the “uri” format, with a “pattern” regular expression of "^urn:uuid:" to indicate the URI scheme and URN namespace.
uri-reference – This string instance is valid if it is a correct reference, corresponding to either a URI or a relative-reference to it, according to the relevant RFC specifications.
iri – This string instance is valid if it is a correct representation of an IRI (Internationalized Resource Identifier), according to the relevant RFC specifications.
iri-reference – This string instance is valid if it is a correct representation of a IRI reference or a relative-reference, according to the relevant RFC specifications.
uuid – This string instance is valid if it is a correct representation of an UUID (Universally unique identifier), according to specifications. This format supports only “plain” UUIDs.
uri-template – This string instance is valid if it is a correct URI Template, according to the relevant RFC specifications.
json-pointer – This string instance is valid if it is a correct JSON string representation of a JSON pointer, according to the relevant RFC specifications.
relative-json-pointer – This string instance is valid if it is a correct relative JSON pointer, according to the relevant RFC specifications.
regex – This string instance is valid if it is a correct regular expression, according to the ECMA-262 regular expression dialect specifications.

Validation for string-encoded data contents

This section lists the validation requirements for the contents that comprise of string-encoded data. There are specific implementation rules that need to be followed to perform a successful validation. For security purposes, the entered implementations must not perform decode, parse and/or validate the string contents automatically. Each string-encoded document is to be evaluated according to the set standards on a conditional basis. All keywords in this section apply only to string data types and have no effect on other data types. Validation is performed for the following data contents:

contentEncoding – Validation is performed if this string instance confronts to several standards. Possible values that are accepted by the system include base-16/32/64 encodings that are listed in RFC-4648. Validation is also done for encodings that are used in MIME file types, according to RFC 2045. All encoding results consist of 7-bit ASCII characters. If the “contentMediaType” is present, but “contentEncoding” is absent, then this will indicate that no transformation is required to represent the content in a UTF-8 string.
contentMediaType – This string instance is valid if it corresponds to the media type of the relevant contents. If “contentencoding” is present, then this property will refer to the decoded string. The string that is entered must be a valid media type, as defined by the relevant RFC specifications.
contentSchema – This string instance is valid if it contains the “contentMediaType” property with appropriate structure descriptions. This keyword may be used with any media type that is capable to be mapped to the valid JSON schema model. If “contentMediaType” is not present, the whole property value will be ignored.

Meta-Data Annotations Considerations

The keywords found in this section are commonly used for documentation and user interface display purposes. They are not designed to be used in comprehensive feature sets.

“title” and “description” – Both of these values must be of the string data type. They are used to decorate a user interface about the data found in them. It is recommended to keep the “title” short, whereas the “description” should carry the explanation for the purpose of the instance.
“default” – No restrictions are placed on this keyword value. Duplicated entries should be removed if such are found. The “default” keyword is used to supply a default JSON value to be used with a given schema.
“deprecated” – This keyword value must be of the boolean data type. If multiple occurrences are applicable to a single instance, the applications will consider the location to be deprecated. If the boolean value is “true”, then the service should refrain from using the given property. A root schema that contains the “deprecated” keyword with a “true” boolean status will indicate that the entire resource may be removed in the future. This keyword will apply to each instance location to which the schema object applies to. If this keyword is omitted, the service will consider it as having the same behavior as a value of “false”.
“readOnly” and “writeOnly” – The values that are input for this keyword must be of the Boolean data type. If there are multiple occurrences, then the results will be processed as considering them as “true” statements. If the “readOnly” boolean value is set to “true”, then the service will process the value in accordance with the owning authority. Any external attempts to modify value properties will be ignored or rejected. Instances that are marked with “writeOnly” for the whole document, may produce an error upon retrieval. This keyword can be used to mark sections like password input fields. When these keywords are omitted, then their behavior for a “false” statement will be considered.
“examples” – The value of this keyword must be an array; no restrictions are placed for the values inside of it. If there are multiple occurrences of this keyword, then the implementation is to provide a flat array of all listed values. This keyword is used to provide sample JSON values that are associated with a particular example. It is recommended that the input values are valid according to the defined schema model. Users can use the “default” value as an additional example.

Security Considerations

When validation of JSON schema is performed, the users need to carefully consider possible security-related implications and possible issues. There are some important best practices that are regulated by the applicable standards. This is especially true for Unicode characters, where a security standard is in place.

Considerations of note are the following:

Regular Expressions – The JSON schema validation is performed on regular expressions. The security implications of this process are that some implementations may include embedding of arbitrary code outside of the JSON schema. This must not be permitted as it can lead to vulnerability exploitation. Attackers can make use of denial-of-service attacks with poorly crafted expressions.
Content Validation Risks – JSON validation of “contentEncoding” and/or “contentMediaType” are at risk as they can evaluate instance string data in an unsafe way based on misleading data. Potential security issues can be mitigated by performing such processes when e relationship between the schema and the instance is established.
Media Type Processing – The different media types that are being encoded or processed all have their own security considerations, subject to their own specifications. It is recommended that proper handling is performed.
Duplicate Names – JSON processing of duplicate names can result in inconsistent behaviour. In this regard improper processing can result in a covert channel. This can be used in penetration testing for a possible intrusion path.
JSON Name/Value Pairs Ordering – It is possible to encode hidden meaning in the order of the name/value pairs in JSON objects. The reason for this is because there is no concrete definition of how this is done according to the official specifications. To minimize the possibility of security exploitation, standardization of the order should be considered. When this is done, hashing and the use of digital signatures can be considered.
JSON Numbering Considerations — JSON exploitation can be done when there is improper handling of the numbering conventions. By specification, there are no restrictions on the number of digits or the precision. Application behavior can be unpredictable when processing some types of numbers. There is a possibility that an application can crash and potentially lead to exploitation when large numbers of digits are handled improperly.
Undefined Unicode Characters Use —Unicode characters should be used throughout the JSON schema use. The use of defined Unicode characters in notation form can lead to unpredictable processing. In some cases, this can result in application crashes which can lead to vulnerability exploitation.

For additional information see the following documents:

JSON Schema Core Specifications

UTF #36: Unicode Security Considerations

JSON Schema Dialect Meta-Scheme Specifications