This is a step-by-step guide to create and run a stream which collects data from and forwards output to Amazon S3 buckets.
The company Acme EV Charging provides a frictionless electric vehicle charging service.
When their customers use the service to charge their cars, the volume, measured in kWh, is logged. The customers are then billed for the total volume on a monthly basis.
The logged Amazon S3 sessions are stored as CSV files in the relevant bucket.
Fields in the CSV format:
Field
Description
type
A string that contains the value Start, Partial, or Complete. This field is used to indicate that the logged session spans multiple files.
date
A string that contains the date when the usage was logged.
kWhCharged
A string that contains the logged amount (kWh) for a partial or a complete session.
userTechnicalId
A unique string that identifies the customer that is bound to the session.
The stream that you will create in this tutorial performs the following tasks:
Collect and decode the CSV files that are available in the S3 bucket
Route the records marked Complete to the Data Aggregator function. The ones marked Start and Partial are simply written to a log file.
Aggregate records based on chargingPlace (charging location), kWhCharged (logged energy consumption) and Date (month)
Forward the records to the billing system, emulated by storing the output in an Amazon S3 bucket
Anchor
step-by-step-guide
step-by-step-guide
Step-by-Step Guide
Follow the instructions in the numbered tabs below.
Ui tabs
Ui tab
title
1. Preparations
Download and extract the sample data file (.zip).
Create an S3 bucket in Amazon Web Services (AWS).
Make a note of the bucket name, access key id, and secret access key.
For further information about creating and accessing S3 buckets, see the Amazon Web Services (AWS) documentation.
Create folders named in and out in the S3 bucket. Upload the extracted files (.csv) to the in folder.
Ui tab
title
2. Create a Stream
createstream
Create or open an existing solution.
When you open your solution, click Create Stream.
Give your stream a name and click Create.
You are then taken to the Stream Editor where you can begin to build your stream.
To start adding functions to your stream, go to step 3.
Ui tab
title
3. Add Amazon S3 Collector
From the functions library on the left, in the Collectors list, drag and drop the AWS S3 function onto the canvas.
The shown Amazon collector in the canvas will be placed with a red circle icon on it. This means that it requires configuration to run properly.
Double-click the function to open and configure it.
Enter your AccessKey and Secret Key.
In Bucket, enter the name of the S3 bucket that you want to collect data from. In Folder, enter the folder path for the file that you want to collect. If the path to the folder is not specified, the root folder of the S3 bucket is selected by default.
Select Specific file and enter the full name of the file (with extension) to be collected. Select the Use Regex check box if the file name is in the regular expression format (regex format) and specify the File name pattern. Select the desired file format. Since the CSV file contains headers, select Include table header. The default delimiter is (,).
In After Collection, select Do nothing.
To see a preview of the data to be collected, click Preview. A preview of your data is displayed to the left.
To add a Route function, go to step 4.
Ui tab
title
4. Add a Route Function
Configure this function to organize the input fields to a format expected by the Data Aggregator function, which serves the payload data.
From the functions library on the left, in the Processors list, drag and drop the Route function onto the canvas. Once placed link it to the Amazon S3 collector.
Double-click the function to open and configure it.
To setup the appropriate Route function conditions a preliminary setup must be made as follows:
Click on the edit icon to rename "Condition #1" to "Complete"
From the dropdown menu of the Compare with selection choose "type" to enable processing for the given data
Set the Expression option to "Equal to" and Value to "Complete"
Set Handling of unmatched records to "Create new output"
Add a Log function to output the default data (which is not used further in this tutorial). Click on the title of the Log function and rename it to "Start and Partial Data".
You can press on Preview to see the data flow.
To add the Data Aggregator function, go to step 5.
Ui tab
title
5. Add the Data Aggregator Function
Configure the Data Aggregator function to collect records on the charged kWh energy amount for the given date. Using the options specify the charging place information, so that the output table will also list where the charging has taken place.
Add a Data Aggregator function and connect it to the Router output previously named "Complete".
Configure the following options to set the required output settings.
Flush settings:
Click Preview to display the Data Aggregator output.
To add an Amazon S3 forwarder, go to step 6.
Ui tab
title
6. Add an Amazon S3 Forwarder
The Amazon S3 forwarder function stores the output from the Aggregate function in an S3 bucket.
Add an Amazon S3 Forwarder function and link it to the Data Aggregator function.
Double-click the function to open and configure it.
Enter your Access key and Secret key.
Enter the name of the bucket and the folder where you want to store the file(s).
Select Collector filename if you want to keep the same filename as your input file(s), or select Custom filename to provide a new filename for the output file(s).
By default, the Append timestamp checkbox is selected. If you clear the checkbox then the existing file at the destination can be overwritten by output file.
For more information, see .
To run your stream, go to step 7.
Ui tab
title
7. Run your Stream
To run your stream, click Start.
Alternatively, you can run your stream from the Streams list. Select your stream and click Start.
The Status indicates when the stream has run to completion.
To see the logs, after your stream has run to completion, click View Log.
Download the result from the out folder in the AWS S3 bucket.
After importing the result to Excel, the example file may look as follows:
You have now successfully collected data from an S3 bucket, which you've aggregated and stored in your S3 bucket!