
It is extremely important to know if your consumers are able to keep up with producers in your Apache Kafka® cluster. If they are not, the lag between the last produced message and the last consumed message will increase until the issue is resolved. Once the lag increases too much, it will impact your business processes and may even lead to lost messages. In gradient fox you can see metrics showing the consumer lag in seconds and messages. You can also create alerts to notify you when the consumer lag increases above a critical threshold.
The lag can be measured in two different ways, both methods are explained in detail using a simplified example depicted in the picture below.
The diagram shows a topic with three partitions P1-3, where messages are shown as rectangles and the timestamp of each message inside the rectangle. The offset of each message is shown in the smaller square in the top-left corner of the rectangle. The current consumer offset is shown using purple color and the last message of each partition is shown using green color.
The lag in messages can simply be calculated by subtracting the offset of the last message from the current consumer offset for each partition. By looking at the picture, we get:
In other words, the total lag in messages for this consumer group in this topic is 9.
In a similar fashion, the lag in seconds be calculated by subtracting the timestamp of the last message from the timestamp of the current consumer message for each partition. By studying the timestamps in the picture, we get: