Kafka
Key features
- Distributed event streaming platform
- Features
- High availability
- Horizontally scalable
- Ingest large volume of data
- High throughput
- Low latency
- Fault tolerance
kafka application
Similar to databases, kafka is a stateful application. So, it needs directory structure, a location to store the messages, etc. This is where all the key-value pairs from the property files come in. They tell kafka where they are supposed to go (inside the Docker container - if we are using a Docker container).
Why kafka?
Fast, resilient and scalable
With Apache Kafka
, streaming data is organized by Kafka topics
. Kafka streams
offer the same high throughput and high performance of message queues, but with different functionality.
When to use kafka?
Large amount of streaming data that requires scaling
and high throughput
Kafka and scaling
Easy horizontal scaling
thanks to built-in partitioning
Replication
Fault tolerance
High level definitions
Kafka
Basically an event streaming platform
. It enables users to collect, store, and process data
to build real-time event-driven applications
. It is written in Java and Scala, but you don’t have to know these to work with Kafka. There’s also a Python API.
- Open-source distributed event streaming platform
- Can be used for capturing any events in real-time and storing for later retrieval
- Can be used for capturing any events in real-time and processing the events in real-time
Event
Event driven architecture - Event
Kafka streams
- A library to build streaming application
- Input and Output data is stored in Kafka
- Compute aggregation or join streams
Default producer and consumer behavior with leaders
- Kafka producers can only write to the
leader broker for a partition
. - Kafka consumers by default will read from the
leader broker for a partition
.
Kafka Consumers Replica fetching (newer kafka versions)
- Since kafka 2.4, it is possible to configure consumers to read from the closest replica.
- This may help improve latency, and also decrease network costs if using the cloud.
Kafka KRaft
- In 2020, the Apache Kafka project started to work to remove the Zookeeper dependency from it (KIP-500)
- Zookeeper shows scaling issues when Kafka clusters have > 100,000 partitions.
- By removing Zookeeper, Apache Kafka can
- Scale to millions of partitions, and becomes easier to maintain and set-up
- Improve stability, makes it easier to monitor, support and administer
- Single security model for the whole system
- Single process to start with Kafka
- Faster controller shutdown and recovery time
- Kafka 3.x now implements the Raft protocal (KRaft) in order to replace Zookeeper
Helpful resources
- https://www.conduktor.io/kafka - This is very good.
- https://medium.com/@TimvanBaarsen/apache-kafka-cli-commands-cheat-sheet-a6f06eac01b
- https://www.gentlydownthe.stream/ - A cute children’s book explaining Kafka.
Use cases
Kafka use cases: https://kafka.apache.org/powered-by
Questions
- What is the relationship between throughput and topics?
- Depending upon the traffic, how many pods should we set up for kafka? e.g. 100,000 requests
- Kafka consumer topics - what are they?
- If there are 10 consumer instances and if there are more messages coming in the topics than the consumer instances can process, what happens?
- What is the relationship between the number of partitions and the number of consumers?
- What if something goes wrong with the consumer? What will happen to the messages in the partitions?
- Lets say we have 10 consumer instances and a hundred messages are coming in the topic? Explain in detail what happens?
- How to write consumers/producers without using spring-cloud-stream? They can be written functionally. How can you write them?
- RabbitMQ vs Kafka Messaging Streams - differences - when would be pick one over the other?
- If you deliver a message to a kafka topic, all the subscribers of the topic will receive that message. But there is a message that you want to deliver to a topic, and you want only one specific subscriber to pick up that message and the rest of the subscribers should not pick up that message. How will you implement that?
- Queues/Topics vs kafka - what is the difference? What advantages does kafka have over traditional queues?
- What are kafka topics?
- What would be a good scenario to use partitions?
- How do you determine how many partitions to use?
TODO
Problems with kafka streams : https://dzone.com/articles/problem-with-kafka-streams-1?fromrel=true
Kafka Racing: Know the Circuit : https://dzone.com/articles/kafka-racing-know-the-circuit
- What are kafka containers?
- Cloudant - kafka streams or queues
- How to publish topic from lambda to kafka stream and from kafka stream to lambda?
Tags
- Stream Processing with Apache Kafka
- kafka and zookeeper
- Event driven architecture - Event
- Kafka - CLI and GUI tools
- Kafka - Clusters, Controllers and Brokers
- Kafka - Consumers and Consumer Groups
- Kafka - How are brokers, topics and partitions related
- Kafka - Messages
- Kafka - PartitionReassignment
- Kafka - Producers
- Kafka - Serialization and Deserialization
- Kafka - Topics, Partitions and Offsets
- Kafka - Idempotent Producers and Consumers
- Understanding reactor-kafka
- Kafka - Integration testing
- Kafka - security
- Kafka - Transactions
- Kafka - Delivery semantics
- Kafka - Stream Bridge
- Kafka - Fan in and Fan out