Connecting Workflows
Kafka
The Apache Kafka framework uses a high-throughput, distributed, publish-subscribe messaging model. The framework is scalable and it can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of a single machine and to allow clusters of coordinated consumers.
The real-time Kafka agents enable to act as a Kafka cluster. Kafka producer agents publish messages (containing UDRs) and these are subscribed to by any number of Kafka collector agents in the workflows that are to be connected. These agents can also be used to connect to external Kafka interfaces.
Connecting real-time workflows with Kafka
All messages are persisted and the publishers are decoupled from the subscribers. This means that the forwarding agent will keep writing to the queue (i e the Kafka log) even if the receiving processing agents terminate. The amount of data that can be queued, also called queue depth, is only limited by storage. If there is enough storage, the queue depth is unlimited. It is highly recommended to use the Kafka agents to connect workflows when:
- There is a need to connect real-time workflows that are running on the same, or in different, installations.
- Minimization of data loss is prioritized.
- One-way communication is sufficient (one-to-one or one-to-many).
Workflow Bridge
A Workflow Bridge agent acts as a bridge for communication between real-time workflows, or between batch and real-time workflows, within the same system. There are several benefits of using Workflow Bridge:
Using real-time-processing capabilities while keeping transaction safety when using the bridge between batch and real-time workflows
Scaling the processing of high volume streaming data by distributing the processing load between execution contexts when bridging between an upstream data collection workflow and one or more downstream real-time workflow(s). When sending data to multiple workflows, the UDRs can be sent in a load-balancing scenario, where specific data can be distributed to a specific workflow. Data can also be broadcast to all downstream workflows.
Batch to real-time Workflow Bridge
Communication
The Workflow Bridge agents communicate with each other using a dedicated set of UDRs. Communication is done in-memory when workflows are executing within the same Execution Context. TCP or Aeron are used when the workflows are running on different Execution Contexts. This provides for efficient transfer of data.
To maintain transactional integrity across multiple workflows, the workflow state changes will be communicated over the workflow bridge. This enables downstream workflows to take action when upstream workflow state changes, and upstream workflows can maintain transaction integrity if downstream workflows fail to execute. For every transaction, a session context with arbitrary data can be kept in the real-time collection agent. This can be used as a session object during the transaction.
In order to optimize performance, it is possible to collect and send data in a bulk from the forwarding agent. When the Workflow Bridge real-time collection agent has received the data bulk, it is unpacked and forwarded as separate UDRs by the agent. The bulk is created by the Workflow Bridge forwarding agent after a configured number of UDRs has been reached, or after a configured time interval. This is specified in the Workflow Bridge profile.