...
A simple Ultra Format needs to be created for the incoming UDRs. For more information about the Ultra Format Editor and the UFDL syntax, refer to the Ultra Format Management User's Guide.
Info |
---|
|
Create an Ultra Format as defined below: Code Block |
---|
| external BOOK_HEADER : identified_by(strREContains(HEADER, "title,name,organization,copyrightYear")), terminated_by(0xA)
{
ascii HEADER : terminated_by(0xA);
};
external BookRecord
{
ascii title : terminated_by(",");
ascii authorName : terminated_by(",");
ascii organization : terminated_by(",");
ascii copyrightYearString : terminated_by(",");
ascii numberOfPages : terminated_by(0xA);
};
internal BookRecord
{
string title;
string authorName;
string organization;
string copyrightYearString;
int numberOfPages;
// enriched
date copyrightYear;
};
// decoder
in_map BOOK_HEADER_InMap : external(BOOK_HEADER), target_internal(BOOK_HEADER), discard_output { automatic; };
in_map BookRecord_InMap : external(BookRecord), internal(BookRecord) { automatic; };
decoder BOOK_HEADER_Decode : in_map(BOOK_HEADER_InMap);
decoder BookRecord_Decode : in_map(BookRecord_InMap);
decoder DECODER { decoder BOOK_HEADER_Decode; decoder BookRecord_Decode *; };
// encoder
out_map BookRecord_OutMap : external(BookRecord), internal(BookRecord) { automatic; };
encoder ENCODER : out_map(BookRecord_OutMap); |
|
...
When you run the Workflow, it processes the CSV file from the input directory and writes out a corresponding Parquet file in the configured output directory.