Kafka

Table of Contents

Key features
kafka application
Why kafka?
When to use kafka?
Kafka and scaling
Replication
Fault tolerance
High level definitions
Helpful resources
Use cases
Questions
TODO
Tags

Key features

Distributed event streaming platform

Features

High availability

Horizontally scalable

Ingest large volume of data

High throughput

Low latency

Fault tolerance

kafka application

Similar to databases, kafka is a stateful application. So, it needs directory structure, a location to store the messages, etc. This is where all the key-value pairs from the property files come in. They tell kafka where they are supposed to go (inside the Docker container - if we are using a Docker container).

Why kafka?

Fast, resilient and scalable

With Apache Kafka, streaming data is organized by Kafka topics. Kafka streams offer the same high throughput and high performance of message queues, but with different functionality.

When to use kafka?

Large amount of streaming data that requires scaling and high throughput

Kafka and scaling

Easy horizontal scaling thanks to built-in partitioning

Replication

Kafka - replication

Fault tolerance

Kafka - fault tolerance

High level definitions

Kafka

Basically an event streaming platform. It enables users to collect, store, and process data to build real-time event-driven applications. It is written in Java and Scala, but you don’t have to know these to work with Kafka. There’s also a Python API.

Open-source distributed event streaming platform
Can be used for capturing any events in real-time and storing for later retrieval
Can be used for capturing any events in real-time and processing the events in real-time

Event

Event driven architecture - Event

Kafka streams

A library to build streaming application
Input and Output data is stored in Kafka
Compute aggregation or join streams

Default producer and consumer behavior with leaders

Kafka producers can only write to the leader broker for a partition.
Kafka consumers by default will read from the leader broker for a partition.

Kafka Consumers Replica fetching (newer kafka versions)

Since kafka 2.4, it is possible to configure consumers to read from the closest replica.
This may help improve latency, and also decrease network costs if using the cloud.

Kafka KRaft

In 2020, the Apache Kafka project started to work to remove the Zookeeper dependency from it (KIP-500)
Zookeeper shows scaling issues when Kafka clusters have > 100,000 partitions.
By removing Zookeeper, Apache Kafka can
- Scale to millions of partitions, and becomes easier to maintain and set-up
- Improve stability, makes it easier to monitor, support and administer
- Single security model for the whole system
- Single process to start with Kafka
- Faster controller shutdown and recovery time
Kafka 3.x now implements the Raft protocal (KRaft) in order to replace Zookeeper

Helpful resources

https://www.conduktor.io/kafka - This is very good.
1. https://www.conduktor.io/kafka/kafka-sdk-list/
https://medium.com/@TimvanBaarsen/apache-kafka-cli-commands-cheat-sheet-a6f06eac01b
https://www.gentlydownthe.stream/ - A cute children’s book explaining Kafka.

Use cases

Kafka use cases: https://kafka.apache.org/powered-by

Questions

What is the relationship between throughput and topics?
Depending upon the traffic, how many pods should we set up for kafka? e.g. 100,000 requests
Kafka consumer topics - what are they?
If there are 10 consumer instances and if there are more messages coming in the topics than the consumer instances can process, what happens?
What is the relationship between the number of partitions and the number of consumers?
What if something goes wrong with the consumer? What will happen to the messages in the partitions?
Lets say we have 10 consumer instances and a hundred messages are coming in the topic? Explain in detail what happens?
How to write consumers/producers without using spring-cloud-stream? They can be written functionally. How can you write them?
RabbitMQ vs Kafka Messaging Streams - differences - when would be pick one over the other?
If you deliver a message to a kafka topic, all the subscribers of the topic will receive that message. But there is a message that you want to deliver to a topic, and you want only one specific subscriber to pick up that message and the rest of the subscribers should not pick up that message. How will you implement that?
Queues/Topics vs kafka - what is the difference? What advantages does kafka have over traditional queues?
What are kafka topics?
What would be a good scenario to use partitions?
How do you determine how many partitions to use?

TODO

Problems with kafka streams : https://dzone.com/articles/problem-with-kafka-streams-1?fromrel=true

Kafka Racing: Know the Circuit : https://dzone.com/articles/kafka-racing-know-the-circuit

What are kafka containers?
Cloudant - kafka streams or queues
How to publish topic from lambda to kafka stream and from kafka stream to lambda?

Key features

kafka application

Why kafka?

When to use kafka?

Kafka and scaling

Replication

Fault tolerance

High level definitions

Kafka

Event

Kafka streams

Default producer and consumer behavior with leaders

Kafka Consumers Replica fetching (newer kafka versions)

Kafka KRaft

Helpful resources

Use cases

Questions

TODO

Tags

Links to this note