The input data in this example use case consists of sales numbers in CSV format. This dataset is from here on, referred to as "sales". The data is collected in real-time from the regions "APAC", "AMERICAS", and "EMEA". We want to calculate the total-, average, and number of sales per minute. These numbers will be our KPIs, broken down per country and region.
Example - Input data
timestamp | region | country | amount |
---|---|---|---|
2017-03-08T13:53:52.123 | EMEA | Sweden | 123.50 |
2017-03-08T13:53:56.123 | APAC | India | 12.12 |
2017-03-08T13:53:59.123 | AMERICAS | US | 425.23 |
Note!
As prerequisite, the scripts must be prepared according to 4.2 Preparing and Creating Scripts for KPI Management.
Step-by-Step Instructions
- Configure the service model.
The service model describes your data, which KPIs to generate and how to calculate them. A JSON representation is used to describe the model, which includes the following top-level objects:dimension
tree
metric
kpi
threshold
(optional)
Start with the
dimension
andtree
objects. The dimensions describe the fields of your data that are used for grouping and thetree
the relation between them. The identifying fields in the input data areregion
andcountry
. A region has one or more countries. The data type issales
. In the dimension object we specify each of our identifying fields as separate objects, with the datatype and field in the body.Define the metrics using the amount field in the input data:
totalSales
- For total sales, sum up the amount for each record by using thesum
function on the expressionexpr
, which contains theamount
field.avgSales
- For average sales use theavg
function instead ofsum
.numSales
- To count the number of records, use the conditional functionisSet
in the expression. This function evaluates to 1 if there is a value inamount
or 0 if there is no value. Use the functionsum
to sum up the 1s and 0s.
Define the KPIs. The expected output is the total sales, average sales, and number of sales per region and country in 60 second periods.
Use the propertynode
to describe where in the topology the KPI should be calculated andwindowSize
to set the period length. Use the names of the metrics defined above in theexpr
property .
Combine all the objects above for a complete representation of the model.Open the Desktop and paste the service model into a KPI profile. Save the profile with the nameSalesModel
in the folderkpisales
. KPI Management reads and writes its data to and from Kafka. In order for this to work you need to configure services for both Kafka and Zookeeper.
The scripts for creating this is described at 4.3.2 Spark, kafka and zookeeper.Configure the Kafka and Zookeper services.
The Kafka service depends on Zookeeper service configured above and you need to ensure that the latter is started first.
The Spark service is required to manage the Spark cluster, which is used for KPI calculations.
The KPI model service is needed to handle the service model that you created in step 1. This service will expose an API to store and update the model. Spark will also use this service to retrieve the model.
- Configure the KPI model service. The service model will be exposed at
http://localhost:8095/api/v1/model?config=kpisales.SalesModel
. - Configure the Spark service.
The Spark application configuration is named
spark-kpi-app1
. A subset of the properties in this particular configuration have default values and are are not set below. These include the references to service instances and the names of the Kafka topics that will be used for input and output. For further information about properties related to the Spark service, see 4.3.2 Spark, kafka and zookeeperThe Spark slave node will have one worker that will be assigned four cores. The cores are split between the executors and the Spark driver. This means that we will have three executors running in parallel. The property
spark.default.parallelism
is set to match this value.The property
kpi-model-config-name
needs to match the folder- and configuration name of the KPI profile that was created in step 1. Start the services.
Create the Kafka topics that are required by the Spark service. Each of the Spark executors needs to read from a separate Kafka partition so the the topics needs three partitions, i.e the number of partitions for each topic must be identical to the value of the property
spark.default.parallelism
in the Spark application configuration.- Create the real-time workflow. In this guide we will use Pulse agents to simulate sales data coming from three different sources, EMEA, AMERICAS, and APAC.
- Add three Pulse agents and an Analysis agent.
Workflow - Pulse Agents
Configure the Pulse agents as follows:- AMERICAS will send 1000 TPS - Set Time Unit to MILLISECONDS and Interval to 1
- EMEA will send 500 TPS - Set Time Unit to MILLISECONDS and Interval to 2
- APAC will send 250 TPS - Set Time Unit to MILLISECONDS and Interval to 4
To be able to identify the data, set the data to the region name.
Pulse agent configuration
The pulse agents only sends us a simple event containing the name of the region, the other data that will be used in the KPI calculations are generated in the connected Analysis agent.
The APL code below creates the input to KPI Management. - Create a Kafka profile for the Kafka Producer agent. This agent will write to the
kpi-input
topic.
Kafka profile configuration - kpi-input - Add a KPI ClusterIn agent.
Workflow - KPI Cluster In agent
KPI Cluster In agent configuration
Configure it to use the KPI profile that you created in step 1. And add the Kafka Profile, that the agent will use to write on thekpi-input
topic. This will be read from by the KPI Management Spark application. The Analysis agent is required by the Kafka Producer agent but will not be used for any specific purpose in this example.
Workflow - Kafka agents - Create a Kafka profile for the Kafka Collector agent. This agent will read from the
kpi-output
topic.
Kafka profile configuration - kpi-output - Configure the Kafka Collector agent to use the profile.
Kafka Collector agent configuration - Add a KPI Cluster Out agent. This agent will create the KPI output.
- Add three Pulse agents and an Analysis agent.
Workflow - KPI Cluster Out agent
Configure the agent to use the KPI profile that you created in step 1.
KPI Cluster Out agent configuration
- Add another Analysis agent for debugging of the KPIs.
Final workflow configuration
Add the APL code below to the Analysis agent.
- Add another Analysis agent for debugging of the KPIs.
Submit the Spark application to the cluster.Open the Spark UI at http://localhost:8080/. You should see that
spark-kpi-app1
is running.
Spark UI
Click on the application and then Streaming at the top of the UI. You will se that the Input Rate is 0 records per second.
Spark UI - Streaming (no data)- Open the workflow configuration in the Workflow Monitor. Enable debugging and select events for the KPI Cluster Out agent and the Analysis agent that produces the debug output.
- Start the workflow.
Switch back to the Spark UI and refresh the page. The streaming statistics should indicate incoming data.
Spark UI - Streaming The calculated KPIs will be displayed in the debug output in the Workflow Monitor.
Note!
It will take a minute before the output is displayed due to the configuration of the
windowSize
property in the service model.