18.5 An XML Format Example

The example in this section describes how to decode and encode XML data.

The example includes the following:

  • Workflow and Ultra configurations (ultra_xml_example.zip):
    • CSV_TO_XML_WF - Collects a CSV formatted file and converts it to XML. Optional fields that are not included in the CSV are populated in the workflow.
    • XML_TO_CSV_WF - Collects am XML formatted file and converts it to CSV. Optional fields in the XML are discarded.
    • ULTRA_CSV - CSV decoder and encoder.
    • ULTRA_XML - XML decoder and encoder.
  • Input data in CSV format (INFILE01.txt).
     

Follow these steps to run the example:

  1. Download the example files to /opt/ultraXML.
  2. Create the subdirectory indata in /opt/ultraXML.
  3. Copy INFILE01.txt to /opt/ultraXML/indata.
  4. Open the System Importer and select /opt/ultraXML/ultra_xml_example.zip. Import all configurations.
  5. Start the workflow CSV_TO_XML_WF. This will create an XML file based on the CSV input in the directory /opt/ultraXML/out.
  6. Copy the generated XML file to the directory /opt/UltraXML/indata.
  7. Start the workflow XML_TO_CSV_WF. A CSV file that is identical to the downloaded input file will be created in /opt/ultraXML/out.

Below is a description of the Ultra configuration that is used for XML encoding and decoding.

Example - ULTRA_XML

To decode or encode XML data, a format definition (an XML Schema syntax) is included in Ultra xml_schema block.

 xml_schema {
     <?xml version="1.0" encoding="ISO-8859-1"?>
     <schema xmlns = "  http://www.w3.org/2001/XMLSchema  ">
        <element name="TRANSACTION_LOG">
          <complexType>
              <sequence>
                 <element ref="TRANSACTION" maxOccurs="unbounded"/>
              </sequence>
          </complexType>
        </element>
       <element name="TRANSACTION">
          <complexType>
<attribute name="TXID" type="string" use="required"/>
             <sequence>
                <element name="USER" type="string" minOccurs="1" maxOccurs="1"/>
                <element name="IP" type="string" minOccurs="1" maxOccurs="1"/>
                <element name="ITEM" type="string" minOccurs="1" maxOccurs="1"/>
                <element name="VALUE" type="long" minOccurs="1" maxOccurs="1"/>
                <element name="TIMESTAMP" type="dateTime" minOccurs="1" maxOccurs="1"/>
                <element name="CURRENCY" type="string" minOccurs="1" maxOccurs="1"/>
                <element name="MISC" type="string" minOccurs="0" maxOccurs="unbounded"/>
            </sequence>
         </complexType>
       </element>
    </schema>
};

Collected XML UDRs are often terminated by one or several whitespace characters. When mapping, the whitespace temporary record is identified. Although input data that includes trailing whitespace characters is valid in XML, it is recommended that you eliminate them when decoding the data. To remove any excessive white spaces from the XML UDRs, the external format WhiteSpace is used with the in_map of the deocder. For further information see 11. In-maps and 12. Decoders.

external WhiteSpace : identified_by(  value == 0x20 || value == 0xA || value == 0xD ) {

    int value: static_size(1);
};

The internal and external formats can be mapped automatically but are mapped explicitly in the example. This is to demonstrate how the interpretation of XML types works.

internal TransactionLog {
    list<Transaction> Transactions;
};
 
internal Transaction {
    string TxId;
    string User;
    string IP;
    string Item;
    long Value;
    date Timestamp;
    string Currency;
    list<string> Misc;
};
 
in_map inTransactionLog: external(TRANSACTION_LOG), internal(TransactionLog) {          
    i:Transactions and e:TRANSACTION using in_map inTransaction;
};
  
in_map inTransaction: external(TRANSACTION), internal(Transaction) { 
    i:TxId and e:TXID;
    i:User and e:USER;
    i:IP and e:IP;
    i:Item and e:ITEM;
    i:Value and e:VALUE;
    i:Timestamp and e:TIMESTAMP;
    i:Currency and e:CURRENCY;
    i:Misc and e:MISC;
};       
           
out_map outTransactionLog: external(TRANSACTION_LOG), internal(TransactionLog) {
          
    i:Transactions and e:TRANSACTION using out_map outTransaction;
          
};
 
out_map outTransaction: external(TRANSACTION), internal(Transaction) {
    i:TxId and e:TXID;
    i:User and e:USER;
    i:IP and e:IP;
    i:Item and e:ITEM;
    i:Value and e:VALUE;
    i:Timestamp and e:TIMESTAMP;
    i:Currency and e:CURRENCY;
    i:Misc and e:MISC;
}; 

To get rid of white spaces, create a <literal>in_map</literal> for the external format  WhiteSpace using the discard_output option. For further information, see the 11. In-maps. 

in_map WS_map: external(WhiteSpace), discard_output {automatic;}; 

To remove as many white spaces as possible from the processed data, WS_map is set first in the deocder.

encoder encTransactionLog: out_map(outTransactionLog);
decoder decTransactionLog: in_map(WS_map), in_map(inTransactionLog);