Metrics

The gradient fox system automatically monitors your Apache Kafka® clusters and collects several metrics that you can view and analyze in your web browser. These metrics are persisted so that they will be available to see long-term treding and performance analysis. The metrics are currently collected at cluster, topic and consumer levels. Each level is described in detail in the sections below.

For each metric in the system, you can select the time window that you want to inspect using the provided drop-down. You can select from a set of pre-defined time intervals starting from 30 minutes all the way up to 365 days.

Cluster Metrics

All the cluster-level metrics can be found under the Metrics-tab of each cluster. Simply navigate to the cluster you want to see metrics for, and click on the tab.

The consumption graph tells you how many messages per second are consumed from this particular Apache Kafka® cluster. For example, assume you have two consumer groups in this cluster. If consumer group one consumes X messages per second and consumer group two consumes Y messages per second, then at the cluster level the consumption is simply X + Y messages per second.

The production graph tells you how many messages per second are produced into this Apache Kafka® cluster. Basically it tells you how many messages are added per second to all topics across the entrie cluster by all producers. Using a similar example, assume you have two producers in this cluster. If producer one produces X messages per second and producer two produces Y messages per second, then at the cluster level the production is simply X + Y messages per second.

For more details on how the cluster production numbers are calculated, see the explanation here.

Topic Metrics

You can find all topic-level metrics under the Metrics tab of each topic. Just navigate to your desired topic and click the Metrics tab.

The topic consumption metrics are very similar to the consumption metrics at the cluster level, the measurements are just done for one topic instead of the entire cluster. In other words, the graph tells you how many messages per second are consumed from this particular topic. For example, assume you have two consumer groups consuming from this topic. If consumer group one consumes X messages per second and consumer group two consumes Y messages per second, then for this topic the consumption is X + Y messages per second.

Topic production metrics closely resemble cluster production metrics, with the primary distinction being that they focus on a single topic rather than the entire Apache Kafka® cluster. The production graph provides insight into the rate at which messages are produced for a given topic, measured in messages per second. Specifically, it represents the total number of messages added per second across all partitions of the topic by all producers. For example, if two producers are publishing messages to the topic—one at a rate of X messages per second and the other at Y messages per second—the total topic production rate would be X + Y messages per second.

For more details on how the topic production numbers are calculated, see the article here.

Consumer Group Metrics

All the consumer group metrics can be found under the Metrics-tab of each consumer. Just navigate to your desired consumer group and click the Metrics tab.

The lag(messages) graph tells you the size of the lag for this consumer, measured as the number of messages. For example, assume the consumer group consumes from two topics. If consumer group lag for topic one is X messages and the lag for topic two is Y messages, then the total lag is simply X + Y messages.

You can also view the lag for a specific topic or show the lags separately per topic by making the corresponding selection in the Topic-dropdown on the left hand side.

To see how the lag(messages) number for a consumer group is calculated, see the post here.

If you want to see the lag for individual partitions, this can be done by selecting a particular topic from the Topic-dropdown. This will present a new Partition-dropdown next to the Topic-dropdown. From this Partition-dropdown you can select an individual partition you want to see the consumer group's lag for.

Alternatively, you can select All(Sum) which will show the sum of all lags for the selected topic, or All(Separately) which will plot the lag for each individual partition in the topic. This is the case depicted in the screen shot below. The legend at the bottom will show you the colors for each partition. You can toggle the visibility of each partition by clicking on its colored box in the legend.

The lag (seconds) graph shows the size of the lag for this consumer group, measured as the number of seconds. For example, assume the consumer group consumes from two topics. If consumer group lag for the first topic is X seconds and the lag for the second topic is Y seconds, then the total lag is simply X + Y seconds.

You can also view the lag in seconds for a specific topic or show the lags separately per topic by making the corresponding selection in the Topic-dropdown on the left hand side.

To see how the lag(seconds) number for a consumer group is calculated, see the post here.

If you want to see the lag in seconds for individual partitions, this can be done by selecting a specific topic from the Topic-dropdown on the left. This will show a Partition-dropdown next to the Topic-dropdown. From this Partition-dropdown you can select an individual partition you want to see the consumer group's lag in seconds for.

Alternatively, you can select All(Sum) which will show the sum of all lags for the selected topic, or All(Separately) which will plot the lag for each individual partition in the chosen topic. This case is displayed in the below screen shot. The legend at the bottom will show you the colors for each partition. You can change the visibility of each partition by clicking on the colored box next to it in the legend.