Kafka - Idempotent Producers and Consumers
Idempotent Producer
Producer idempotence ensures that duplicates are not introduced due to unexpected retries.
https://www.conduktor.io/kafka/idempotent-kafka-producer/
What is an Idempotent Kafka producer? How do they help avoid duplication?
Problem with Retries: Retrying to send a failed message often includes a small risk that both messages were successfully written to the broker, leading to duplicates. This can happen as illustrated below.
- Kafka producer sends a message to Kafka
- The message was successfully written and replicated
- Network issues prevented the broker acknowledgment from reaching the producer
- The producer will treat the lack of acknowledgment as a temporary network issue and will retry sending the message (since it can’t know that it was received).
- In that case, the broker will end up having the same message twice.
How does it work internally?
When enable.idempotence
is set to true
, each producer gets assigned a Producer Id (PID) and the PID is included every time a producer sends messages to a broker. Additionally, each message gets a monotonically increasing sequence number (different from the offset - used only for protocol purposes). A separate sequence is maintained for each topic partition that a producer sends messages to. On the broker side, on a per partition basis, it keeps track of the largest PID-Sequence Number
combination that is successfully written. When a lower sequence number is received, it is discarded.
How should Kafka producer idempotence be enabled?
If you already use acks=all
then you should enable this feature. All you need to do to turn this feature on is use the producer configuration enable.idempotence=true
.
Overall, it is recommended to enable producer idempotence for all your Kafka Producers. If you are not using the Java SDK by default, ensure your client library has support for this feature!
If the application has a bug, and creates duplicate messages, will idempotence help?
No. Because, these messages have different PID-Sequence Number
combinations.
Idempotent Consumer
https://developer.confluent.io/patterns/event-processing/idempotent-reader/
Is there a configuration setting to enable this? No.
In that case, how will we solve the problem?
- Let the producer set some unique message/event id (UUID)
- Broker receives the messages
- Check if they are present in a table (a custom table that is created for this).
- If yes, it means that it is a duplicate. Simply acknowledge. Skip processing.
- If no, it means that it is a new message. Process the message, insert into db, and then ack.
What if the producer is not sending a unique message or event id?
Create them on your own in the Consumer using the metadata.
e.g. Topic + Partition + Offset, etc.