Kafka Quick Start

what’s kafka

Kafka is a distributed streaming platform. Think of it as a high-throughput, fault-tolerant message queue on steroids. It’s designed for handling real-time data feeds.

Concepts

Topic:
A category or feed name to which records are published.
Partition:
A topic is divided into partitions, which are ordered, immutable sequences of records. Partitions enable parallelism and scalability.
Producer:
An application that publishes records to a Kafka topic.
Consumer:
An application that subscribes to one or more topics and processes the records.
Broker:
A Kafka server. Brokers store the data.
Cluster:
A group of brokers working together.
Replica:
Each partition can be replicated across multiple brokers for fault tolerance.
Leader:
One replica of a partition is designated as the leader, handling all read and write requests.
Follower:
Other replicas of a partition are followers, replicating data from the leader.
Offset:
A unique, sequential ID assigned to each record within a partition. Consumers track their position in a partition using offsets.
Consumer Group:
A group of consumers that work together to consume records from a topic. Each partition is assigned to one consumer within a group.
Retention Policy:
Defines how long Kafka retains records before deleting them.
ZooKeeper:
Used for managing and coordinating the Kafka cluster (though newer versions are moving away from ZooKeeper).

Common Use Cases

Real-time data pipelines:
Ingesting and processing streams of data from various sources.
Log aggregation:
Collecting logs from multiple servers into a central location.
Stream processing:
Building real-time applications that analyze and react to data streams.
Event sourcing:
Storing a sequence of events that represent changes to an application’s state.
Messaging:
Reliable, high-throughput messaging between applications.
Activity tracking:
Track user activity on a website or app in real time.
Commit log:
Used as a commit log for distributed databases.

Kafka’s Role in System Design

Decoupling:
Kafka decouples producers and consumers, allowing them to evolve independently.
Scalability:
Kafka can handle massive amounts of data and scale horizontally by adding more brokers.
Reliability:
and fault tolerance ensure that data is not lost.
Buffering:
Kafka acts as a buffer between producers and consumers, smoothing out spikes in traffic.
Data integration: Kafka can integrate data from various sources into a single platform.

about zookeeper

The key development is the move away from ZooKeeper with the introduction of KRaft.

What is KRaft (Kafka Raft)

the Shift Away from ZooKeeper:

KRaft is a consensus protocol that allows Kafka to manage its metadata internally, eliminating the need for an external ZooKeeper cluster.

It essentially integrates metadata management directly into Kafka itself.

Why the shift?

Simplified Operations:
Managing ZooKeeper adds complexity to Kafka deployments. Removing it streamlines operations.
Improved Scalability:
ZooKeeper can become a bottleneck in very large Kafka clusters. KRaft aims to improve scalability.
Unified Architecture:
A self-contained Kafka system is easier to understand and manage.

The timeline

The Kafka community has been progressively working towards making KRaft production-ready.

Kafka versions 3.x have seen increasing KRaft maturity. It is expected that future major releases of Kafka, like 4.0, will fully remove the dependancy of Zookeeper.

Key benefits

Simplified deployments.
Enhanced scalability.
Improved resilience.

In essence:

Kafka’s future is focused on becoming a more self-sufficient and easier-to-manage distributed system. KRaft is a major step in that direction.

diagrams

architecture diagram

flowchart TD
    classDef producer fill:#92D050,color:#000,stroke:#92D050
    classDef broker fill:#0072C6,color:#fff,stroke:#0072C6
    classDef consumer fill:#B4A0FF,color:#000,stroke:#B4A0FF
    classDef zk fill:#FFC000,color:#000,stroke:#FFC000

    subgraph Producers["Producers"]
        P1[Producer 1]:::producer
        P2[Producer 2]:::producer
    end

    subgraph Brokers["Kafka Cluster"]
        B1[Broker 1
Leader]:::broker
        B2[Broker 2
Follower]:::broker
        B3[Broker 3
Follower]:::broker
        
        subgraph Partitions["Topic Partitions"]
            TP1[P0]:::broker
            TP2[P1]:::broker
            TP3[P2]:::broker
        end
    end

    subgraph Consumers["Consumer Groups"]
        CG1[Group 1]:::consumer
        CG2[Group 2]:::consumer
    end

    ZK[ZooKeeper]:::zk

    P1 & P2 --> B1 & B2 & B3
    B1 & B2 & B3 --> CG1 & CG2
    ZK -.-> B1 & B2 & B3
    ZK -.-> CG1 & CG2

    %% Legend
    subgraph Legend["Legend"]
        L1[Producer]:::producer
        L2[Broker]:::broker
        L3[Consumer]:::consumer
        L4[ZooKeeper]:::zk
    end

Explanation of the architecture diagram

Solid lines represent direct data flow between producers, brokers, and consumers
Dotted lines show ZooKeeper’s coordination role (managing cluster state and consumer groups)
Each broker can host multiple partitions (shown as P0, P1, P2)
Consumer Groups allow multiple applications to consume the same topics independently

dataflow diagram

graph LR
    A[Producer] -->|Publish| B(Topic);
    B --> C{Partition};
    C --> D[Partition 1];
    C --> E[Partition 2];
    C --> F[Partition N];
    D --> G(Broker 1);
    E --> H(Broker 2);
    F --> I(Broker N);
    G --> J[Leader Replica];
    H --> K[Follower Replica];
    I --> L[Leader Replica];
    J --> M{Offset};
    K --> M;
    L --> M;
    N[Consumer Group] --> O[Consumer 1];
    N --> P[Consumer 2];
    N --> Q[Consumer N];
    O --> D;
    P --> E;
    Q --> F;
    R[ZooKeeper] -- Manage --> G;
    R -- Manage --> H;
    R -- Manage --> I;
    S[Retention Policy] --> B;
    T[Data Source] --> A;
    M --> O;
    M --> P;
    M --> Q;
    subgraph Kafka Cluster
    G;H;I;J;K;L;
    end
    subgraph Topic and Partitions
    B;C;D;E;F;
    end
    subgraph Consumer Group
    N;O;P;Q;
    end

Explanation of the dataflow Diagram

Producer:
Publishes messages to a specific Topic.
Topic:
Is divided into multiple Partitions.
Partitions:
Are distributed across multiple Brokers.
Brokers:
Each Broker can have multiple Partitions, and each Partition has a Leader Replica and Follower Replicas.
Offset:
Each message within a Partition is assigned a unique Offset.
Consumer Group:
Consists of multiple Consumers.
Consumers:
Subscribe to a Topic and read messages from Partitions, tracking their position using Offsets.
ZooKeeper:
Manages the Kafka Cluster, coordinating Brokers and Leader election.
Retention Policy:
Determines how long messages are stored in the Topic.
Data Source:
Provides the data that the Producer sends to Kafka.

Connections

The diagram illustrates the flow of data from Producers to Consumers through Topics and Partitions.
It shows how Brokers and Replicas ensure fault tolerance.
It highlights the role of Offsets in tracking message consumption.
It shows the relationship between consumer group and consumer.
It shows that zookeeper manages the brokers.
It shows that retention policy applies to topic.
It shows that data source connect to producer.

replication mechanism and leader select diagram


sequenceDiagram
    participant P as Producer
    participant L as Leader Broker
    participant F1 as Follower Broker 1
    participant F2 as Follower Broker 2
    participant ZK as ZooKeeper
    participant B3 as Broker 3
    
    Note over P,ZK: Normal Operation
    P->>+L: Send Message
    L->>L: Write to Log
    L->>-P: Acknowledge
    
    par Replication
        L->>F1: Replicate Message
        F1->>F1: Write to Log
        L->>F2: Replicate Message
        F2->>F2: Write to Log
    end
    
    Note over L,ZK: Leader Failure Detected
    ZK->>F1: Elect as New Leader
    P->>+F1: Send New Message
    F1->>F1: Write to Log
    F1->>-P: Acknowledge
    
    par Recovery
        F1->>B3: Replicate Message
        B3->>B3: Write to Log
    end

Explanation of the Diagram three

Parallel lines show simultaneous replication to multiple followers
When the leader fails, ZooKeeper elects a new leader from available followers
The system maintains consistency even during leader transitions

what’s kafka#

Concepts#

Common Use Cases#

Kafka’s Role in System Design#

about zookeeper#

What is KRaft (Kafka Raft)#

Why the shift?#

The timeline#

Key benefits#

diagrams#

architecture diagram#

Explanation of the architecture diagram#

dataflow diagram#

Explanation of the dataflow Diagram#

Connections#

replication mechanism and leader select diagram#

Explanation of the Diagram three#