Spark, kafka and zookeeper

Kafka and zookeeper are required for sending data to and from the Spark cluster. 

Spark applications must be configured with a set of Kafka topics that are either shared between multiple applications or dedicated to specific applications. The assigned topics must be created before you submit an application to Spark. Before you can create the topics you must start the Kafka and Zookeeper services.

See Preparing and Creating Scripts for KPI Management on how to start Spark, Kafka, and Zookeeper.

The topics are for transferring data to the Spark Application, receiving calculated KPIs from Spark, and a third topic for alarms. The default names of the topics are kpi-input, kpi-output, and kpi-alarm, but the names can be altered in the KPI Management Profile. Ensure that the number of partitions must match the number of Kafka brokers.

Retention Settings

The default data retention period in Kafka is one day. You can change the length of this period to conserve disk space.

Set the following properties in the file server.properties in the config-folder of Kafka:

log.retention.bytes - Must be greater than value of the property log.segment.bytes

log.segment.bytes - Must exceed the size of the input/output segments to and from Kafka

log.retention.hours - Must be greater than the largest window size in the service model by at least factor 3.

Hint!

The instruction above will change the retention settings for all topics in the Kafka cluster. You can also override the retention setting for individual topics during creation. For further information see Starting Clusters and Creating Topics.

For further information about Kafka, see /wiki/spaces/MD82/pages/3785234 in the /wiki/spaces/MD82/pages/3768690.