Kafka - Clusters, Controllers and Brokers
Kafka cluster
In real life scenarios, we will not be running a single server in a cluster. There will be many many kafka servers running together in a kafka cluster. These are called kafka brokers.
All the brokers in a cluster are like a family. All of them talk to each other. All of them know about each other.
Responsibilites that each server in a kafka cluster can have
In server.properties, look for this
process.roles=broker,controller
- Broker
- Controller
- Broker + Controller
Kafka Controller
- There will be a manager among all the Nodes in a kafka cluster.
- The manager assigns responsibilities to each of the brokers.
- This manager is called a
Controller
. - (Think of Master-Workers Architecture and Leader Election Algorithm )
- In a small cluster, a single same server can act as a Controller as well as a Broker.
- In a large cluster with thousands of Nodes, it will be difficult for a single same server to act as a Controller as well as a Broker.
Kafka Broker
- These are responsible for interacting with the clients. They receive events from or send events to the clients.
A single Kafka Cluster is made of or composed of multiple Brokers
- They are just servers - but in kafka, they are called brokers because they receive and send data.
The brokers handle producers and consumers and keeps data replicated in the cluster
.- Each broker is identified with its ID (integer).
- e.g. Broker101, Broker102, Broker103
- Each broker contains certain topic partitions. This assignment is done by the Leader. e.g. If the cluster is handling order events, the Leader will pick some brokers from the cluster to handle these events.
- The data for order events is going to be distributed into multiple brokers.
- One of these brokers is going to be the primary Node for data related to order events.
- But the Leader will also assign some other Nodes to be back-up for this primany Node for order events.
- After connecting to any broker (called a
bootstrap broker
), you will be connected to the entire cluster (Kafka clients have smart mechanics for that). - A
good number to get started is 3 brokers
.- Some big clusters have over 100 brokers.
Bootstrap server and Kafka broker discovery
- Lets say that we are working with a kafka cluster with thousands of Nodes.
- For a client application instance that needs to work with a specific type of event, how will it know which Node in the server it needs to connect to?
Every kafka broker is also called a "bootstrap server"
.- That means that
the client application only needs to connect to one broker, and the Kafka clients will know how to be connected to the entire cluster
(smart clients) - When a client initiates a request to a broker, a connection will be first established with that broker and a list of all the other brokers from that cluster is returned from that broker.
- Each broker knows about all brokers, topics and partitions (cluster-metadata).
- But what if a client application is trying to establish connection with a particular Node in the cluster and that Node dies? If the client application maintains a list of bootstrap servers instead of trying to connecting to only one Node, this problem can be eliminated.
cluster-metadata
- This is an internal topic.
- For sample, look at “Anatomy of a topic” in Kafka - Topics, Partitions and Offsets
- The controller broker of the cluster publishes metadata about topics, partitions and nodes to this internal topic. e.g. Which broker is the leader for which partition? Which brokers are the followers? etc.
- All the brokers in the cluster can read this topic. That is how they will know all the information about everything that is going on in the cluster.
Different types of communication in a kafka cluster
There are different types of communication going on in a kafka cluster
- Controller nodes communicating among themselves (communication within the subnet)
- Brokers communicating among themselves - transferring data, etc. (communication within the subnet)
- External applications talking to brokers for producing/consuming messages.
A kafka broker can be listening on multiple ports for different things. One for Controller communication. One for Broker communication. One for communication with external applications.
These are called Listeners
.
The brokers specify/expose which ports they are using for internal/external communication. Otherwise, the external applications wouldn’t know how to connect to brokers for producing/consuming messages.
Cluster Configuration and Set-up
auto.create.topics.enable
https://kafka.apache.org/documentation/#brokerconfigs_auto.create.topics.enable
Enable auto creation of topic on the server.
Type: boolean Default: true Valid Values: Importance: high Update Mode: read-only