Kafka - replication

Replicate data on different partitions on different nodes for resiliency

  1. Spin up the cluster: https://github.com/explorer436/programming-playground/tree/main/docker%20compose%20files/kafka/03-kafka-cluster-setup
  2. In any one of the nodes on the cluster, create a topic with multiple partitions and replication factor.
    [explorer436@explorer436-p50-20eqs27p03 03-kafka-cluster-setup]$ docker ps
    CONTAINER ID   IMAGE              COMMAND          CREATED          STATUS          PORTS                                       NAMES
    f92d38bdbeac   vinsdocker/kafka   "sh runner.sh"   11 seconds ago   Up 10 seconds   0.0.0.0:8083->8083/tcp, :::8083->8083/tcp   kafka3
    eb2bbf25c128   vinsdocker/kafka   "sh runner.sh"   11 seconds ago   Up 10 seconds   0.0.0.0:8081->8081/tcp, :::8081->8081/tcp   kafka1
    281920b9d95a   vinsdocker/kafka   "sh runner.sh"   11 seconds ago   Up 10 seconds   0.0.0.0:8082->8082/tcp, :::8082->8082/tcp   kafka2
    [explorer436@explorer436-p50-20eqs27p03 03-kafka-cluster-setup]$ docker exec -it kafka1 bash
    root@eb2bbf25c128:/learning# kafka-topics.sh --bootstrap-server localhost:9092 --create --replication-factor 3 --partitions 3 --topic first_kafka_topic
    WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
    Created topic first_kafka_topic.
    root@eb2bbf25c128:/learning# kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic first_kafka_topic
    Topic: first_kafka_topic	TopicId: 5dHIG2pSQACllePlxIbObg	PartitionCount: 3	ReplicationFactor: 3	Configs:
         Topic: first_kafka_topic	Partition: 0	Leader: 3	Replicas: 3,1,2	Isr: 3,1,2
         Topic: first_kafka_topic	Partition: 1	Leader: 1	Replicas: 1,2,3	Isr: 1,2,3
         Topic: first_kafka_topic	Partition: 2	Leader: 2	Replicas: 2,3,1	Isr: 2,3,1
    

What is Isr?

In sync replicas

It basically means that the non-leader brokers will have exactly the same data as the leader does.

If we pick a broker that is a replica for Partition 0 - e.g. Node 2,

docker container stop kafka2

And, describe the topic again:

root@eb2bbf25c128:/learning# kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic first_kafka_topic
Topic: first_kafka_topic	TopicId: 5dHIG2pSQACllePlxIbObg	PartitionCount: 3	ReplicationFactor: 3	Configs:
        Topic: first_kafka_topic	Partition: 0	Leader: 3	Replicas: 3,1,2	Isr: 3,1
        Topic: first_kafka_topic	Partition: 1	Leader: 1	Replicas: 1,2,3	Isr: 3,1
        Topic: first_kafka_topic	Partition: 2	Leader: 3	Replicas: 2,3,1	Isr: 3,1

Notice that

  1. the leader for Partition 2 changes
  2. the replicas for all the partitions change

Start that broker again

$ docker start kafka2

And, describe the topic again:

root@eb2bbf25c128:/learning# kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic first_kafka_topic
Topic: first_kafka_topic	TopicId: 5dHIG2pSQACllePlxIbObg	PartitionCount: 3	ReplicationFactor: 3	Configs:
        Topic: first_kafka_topic	Partition: 0	Leader: 3	Replicas: 3,1,2	Isr: 3,1,2
        Topic: first_kafka_topic	Partition: 1	Leader: 1	Replicas: 1,2,3	Isr: 3,1,2
        Topic: first_kafka_topic	Partition: 2	Leader: 3	Replicas: 2,3,1	Isr: 3,1,2

Links to this note