KafkaGuaranteesLab

KafkaGuaranteesLab

A Spring Boot application demonstrating the limitations of Kafka guarantees. It layers producer/consumer configuration with Resilience4j circuit breakers and retries to show how the guarantee is maintained end-to-end under failure. Kafka itself is configured for exactly once delivery using an idempotency key. The demo shows how this can greatly reduce the number of duplicate messages, but cannot eliminate them entirely without risking data loss.

The Problem

Kafka provides strong delivery and ordering guarantees for applications that follow the API contract. Under most conditions, every message is processed exactly once and in the same order by both producer and consumer. However, developers need to be aware of the boundaries and plan for failures. This demo shows how to handle common failure modes.

A Quick Peek Under the Covers

To understand where things can go awry, it’s helpful to understand what happens when things go right. There are three common delivery guarantees for messages:

Obviously, you can’t get exactly-once by using the easy approaches to at-most-once and at-least-once at the same time. What you can do instead is use retries as needed to ensure at least once delivery, then filter duplicates. Duplicates can happen when the sender retries while the receiver is down, or when the receiver is slow to process messages. That’s what Kafka does when enable.idempotence=true is set. Each producer session is assigned a unique producer ID, and each message a sequence number; the broker uses these to filter out duplicates. transactional.id extends this to cover producer restarts and atomic consume+produce pairs.

Common Failure Modes

  1. Producer acknowledgement failure: when an application sends a message, but the broker does not confirm receipt.
  2. Offset commit before processing: auto-commit advances the offset before process() finishes, so a crash between the commit and the work silently drops the message.
  3. Application-level failure without fallback: an exception thrown inside the listener that is not caught by a bounded retry with a dead-letter exit will either retry indefinitely or be silently dropped when retries are exhausted, with no record of the failure.

Each layer has a different remedy, and this demo shows all three.

Limits of the Idempotent Producer

The idempotent producer eliminates duplicates from Kafka’s own internal retry loop, but only within a single producer session. That is a narrower guarantee than it sounds.

Producer-side gaps

Consumer-side gaps (outside the idempotent producer’s scope entirely)

What transactional.id adds

transactional.id gives the producer a stable identity that survives restarts. The broker increments a producer epoch on each reconnect, fencing any zombie instances still running with the old epoch. Combined with the transactional API, it lets you atomically commit a consume+produce pair as a single unit. Without it, the idempotent producer only eliminates duplicates from Kafka’s own internal retry loop. Everything above that layer still requires application-level deduplication.

What This Demo Shows

Requirements

Running the Demo

Start a local Kafka broker in KRaft mode (no ZooKeeper):

./kafka-local.sh start

Build and run:

./gradlew bootRun

Send a language preference event:

curl -X POST http://localhost:8080/language-preferences \
  -H 'Content-Type: application/json' \
  -d '{"customerId": "abc123", "preferredLanguage": "fr-CA"}'
# HTTP 202 Accepted

Check circuit breaker and retry state:

curl http://localhost:8080/actuator/circuitbreakers
curl http://localhost:8080/actuator/retries

Stop the broker when done:

./kafka-local.sh stop

Architecture

Delivery guarantee layers

At-least-once delivery is enforced at two independent layers. The Kafka broker protocol handles durability on the producer side. Manual offset management handles redelivery on the consumer side. Resilience4j adds application-level retry and circuit breaking on top of both.

flowchart TD subgraph Producer["Producer side"] CTRL["POST /language-preferences\nLanguagePreferenceController"] PROD["LanguagePreferenceProducer\n@Retry + @CircuitBreaker"] KT["KafkaTemplate.send()\nacks=all · idempotent · unlimited retries"] DLS["Dead-letter store\n(circuit breaker fallback)"] CTRL -->|"202 Accepted\n(fire and forget)"| PROD PROD --> KT PROD -->|"circuit open"| DLS end subgraph Kafka["Kafka"] TOPIC["language-preferences"] DLT["language-preferences.DLT"] end subgraph Consumer["Consumer side"] CONS["LanguagePreferenceConsumer.onMessage()\nAckMode.MANUAL_IMMEDIATE"] PROC["process()\n@Retry + @CircuitBreaker"] ACK["ack.acknowledge()\ncommit offset"] EH["DefaultErrorHandler\nFixedBackOff 1 s · 2 attempts"] CONS --> PROC PROC --> ACK CONS -->|"exception → rethrow\n(no ack)"| EH EH -->|"retries exhausted"| DLT end KT --> TOPIC TOPIC --> CONS

Sequence: successful delivery

sequenceDiagram participant Client participant Ctrl as LanguagePreferenceController participant Prod as LanguagePreferenceProducer participant K as Kafka participant Cons as LanguagePreferenceConsumer participant Proc as process() Client->>Ctrl: POST /language-preferences Ctrl->>Prod: publish(event) Ctrl-->>Client: 202 Accepted Prod->>K: send(key=customerId, acks=all) K-->>Prod: broker ack K->>Cons: onMessage(record) Cons->>Proc: process(event) Proc-->>Cons: success Cons->>K: ack.acknowledge() Note over K: offset committed

Sequence: consumer failure and DLT routing

sequenceDiagram participant K as Kafka participant Cons as LanguagePreferenceConsumer participant Proc as process() participant EH as DefaultErrorHandler participant DLT as language-preferences.DLT K->>Cons: onMessage(record) [attempt 1] Cons->>Proc: process(event) Proc-->>Cons: throws exception Cons->>EH: rethrow (no ack) Note over EH: FixedBackOff — wait 1 s, retry EH->>Cons: onMessage(record) [attempt 2] Cons->>Proc: process(event) Proc-->>Cons: throws again Cons->>EH: rethrow (no ack) Note over EH: retries exhausted EH->>DLT: route to dead-letter topic

Configuration reference

Kafka producer

Setting Value Purpose
acks all All in-sync replicas must acknowledge before the send completes
enable.idempotence true Prevents duplicate records from broker-level retries
retries Integer.MAX_VALUE Delegates retry decisions to the application and circuit breaker
max.in.flight.requests.per.connection 5 Maximum allowed with idempotence enabled

Kafka consumer

Setting Value Purpose
enable.auto.commit false Offset committed manually after successful processing only
auto.offset.reset earliest No committed offset on first start — consume from the beginning
AckMode MANUAL_IMMEDIATE Commits immediately when ack.acknowledge() is called

Resilience4j

Both the producer and consumer have independent instances configured in application.yml.

Circuit breaker defaults

Instance Failure threshold Slow call threshold Slow duration Wait in open
languagePreferenceProducer 50% 80% 2 s 30 s
languagePreferenceConsumer 50% 30 s

Both instances use a count-based sliding window of 10 calls with 3 calls permitted in half-open state.

Retry defaults

Both instances: 3 attempts, 1 s fixed wait, retries on any Exception.

DefaultErrorHandler

FixedBackOff(1 s, 2 attempts) — up to 2 delivery retries before the message is routed to the dead-letter topic. This operates at the Kafka listener container layer, independently of the Resilience4j retry inside process().

Layout

treeView-beta "kafka-local.sh" "src" "resources" "application.yml" "java" "..." "KafkaGuaranteesLabApplication.java" "config" "KafkaConfig.java" "consumer" "LanguagePreferenceConsumer.java" "producer" "LanguagePreferenceProducer.java" "LanguagePreferenceController.java" "model" "LanguagePreference.java"

Technologies

Component Version
Java 25 (toolchain; runs on 17+)
Gradle 9.5.1
Spring Boot 4.0.6
Resilience4j 2.4.0
Spring Kafka (via Spring Boot)
Micrometer/Prometheus (via Spring Boot)
JaCoCo 0.8.14