Streaming vs Messaging

From Imperative programming to Reactive programming

Lets look at the limitations of messaging queues and then see how Streaming solves them.

A message queue allows a bunch of subscribers to pull a message, or a batch of messages, from the end of the queue. Queues usually allow for some level of transaction when pulling a message off, to ensure that the desired action was executed, before the message gets removed.

Not all queueing systems have the same functionality, but once a message has been processed, it gets removed from the queue. If you think about it, it’s very similar to imperative programming, something happened, and the originating system decided that a certain action should occur in a downstream system.

Even though you can scale out with multiple consumers on the queue, they will all contain the same functionality, and this is done just to handle load and process messages in parallel. In other words, it doesn’t allow you to kick off multiple independent actions based on the same event. All the processors of of the queue messages will execute the same type of logic in the same domain. This means that the messages in the queue are actually commands, which is suited towards imperative programming, and not an event, which is suited towards reactive programming.

With Streaming on the other hand, we publish messages/events to topics, and they get persisted. The messages don’t get removed from the topics when consumers receive them. This allows us to replay messages, but more importantly, it allows a multitude of consumers to process logic based on the same messages/events.

We can still scale out to get parallel processing in the same domain, but more importantly, we can add different types of consumers that execute different logic based on the same event. In other words, with Streaming, we can adopt a reactive pub/sub architecture.

Execution of different logic by different systems based on the same events is possible with Kafka due to the fact that messages are retained and the concept of consumer groups. Consumer groups in Kafka identify themselves to Kafka when they ask for messages on a topic. Kafka will record which messages (offset) were delivered to which consumer group, so that it doesn’t serve it up again. Actually, it is a bit more complex than that, because you have a bunch of configuration options available to control this, but we don’t need to explore the options fully just to understand Kafka at a high level.

Differences

The primary difference between queues and streams is the means of message delivery. This seemingly subtle difference completely changes the landscape of use cases for each of these services.

Feature Streaming Messaging
Purpose Streaming focuses on real-time data processing and analytics. Message queues are designed to store and transmit messages between distributed components. Used to decouple producing and consuming applications.
Features 1. low-latency data transfer. 1. guaranteed message delivery - ensures that no data is lost in the process
2. allows for data analysis across multiple time windows 2. message persistence
3. consumer acknowledgments
Ideal for which scenarios where real-time analysis (analytics) and processing of data are crucial where the primary concern is reliability of message delivery.
Example use-cases 1. Scenarios where time-sensitive actions are necessary, like real-time fraud detection or live dashboarding. Processing large amounts of fast-moving data. Load leveling and balancing between producers and consumers.
2. Processing large amounts of fast-moving data.
Delivery to consumer applications There can be many consumers. Streaming brokers send the same message to every subscriber of that log file. Allows multiple consumers to read the same data simultaneously, facilitating a publish-subscribe model. While many consumers may be active, queues only deliver messages to a single consumer (typically whichever consumer is available to receive it first) before removing that message from the queue.
Availability of a message A streaming broker uses a distributed log file, so consumers can move backward and forward within that file to re-process messages they’ve already received on command. In a queue, once a message is delivered, it’s gone forever. To reprocess a message, you have to have a backup, like a batch layer, so that you can put it back into the queue.
Data Persistence Data in streaming platforms is often persisted for a specified period, allowing consumers to replay the data if required. Messages remain in the queue until they are consumed or until they expire based on some policy.
Volume & Throughput Built for high-throughput and can handle massive volumes of data flowing into the system continuously. Well-suited for scenarios with fluctuating data rates, accommodating spikes in data inflow without overloading consumers.
Scalability Designed for vertical and horizontal scalability - scaling out the streaming brokers over many servers, allowing addition of more nodes to handle larger data loads and more consumers. Horizontal scaling of consumer instances. Typically scaled by increasing the number of consumer instances or by partitioning messages.
Complexity & Features Often come with a wider range of features, like data windowing, event-time processing, and complex event processing, making them more intricate. Generally straightforward with the primary focus on ensuring message delivery without data loss.
Replication Supports replication Do not have replication
Data Ordering Maintains data order across multiple partitions. There’s really no way the messages could overtake or pass each other inside the queue, so the only way a receiver could get these messages out is one-by-one, in the order they were put in. This guarantees strict ordering, but places strong limitations on our receiver. There can only be one agent on the receiver side that’s processing each message—if there was more than one, there would be no guarantee that the messages were processed in order. Because each new agent could processes each message independently, they could each finish and start on the next message at any time. If the are two agents, A & B, and Agent A receives the first message and Agent B the second; Agent B could finish processing the second message and start on the third message even before Agent A is finished processing the first message. Though the messages were received from the queue strictly in the order that they were put in, if there are multiple receiving agents there’s no way to say the messages will be processed in that order.
Supported programming style Reactive programming style Imperative programming style

Tags

  1. Kafka
  2. Reactive Programming
  3. Streaming
  4. Spring Cloud Stream
  5. Stream Processing Systems
  6. Stream Processing with Apache Kafka
  7. Stream Processing with RabbitMQ

Links to this note