Smaller is Better: Why You Should Avoid Large, Multi-Target Exporters in Prometheus

August 17, 2024 by Julius Volz

When monitoring many different services or devices with Prometheus, you may be tempted to build a single large exporter that aggregates metrics from all services on one machine, or even from an entire cluster or other large group of monitoring targets. That way, Prometheus only needs to scrape a single endpoint instead of many individual ones. However, this approach comes with a number of downsides and throws away important benefits of Prometheus' pull-based monitoring model and service discovery integration. Let's have a look at these problems and why single-target exporters generally give you a better monitoring architecture.

Using a single exporter for multiple targets

The advice in this article applies to any exporter that aggregates metrics from different services into a single Prometheus scrape. But as an example scenario, imagine that you have a mix of services running on a single machine, and you want to monitor them with Prometheus. One way of doing this would be to use a single exporter that collects metrics from all services and exposes them on a single Prometheus HTTP endpoint:

Single exporter for multiple targets

This way, Prometheus only needs to scrape a single target, and you don't need to configure Prometheus to know about all the individual service processes running on the machine. That may sound convenient at first, but having a single big exporter that aggregates metrics for many different services has several downsides. Let's have a look at them.

Downsides of multi-target exporters

The downsides of a single large exporter are:

It becomes an operational and scaling bottleneck and a single point of failure

When a single exporter is responsible for aggregating and exposing metrics for all services on a machine (or cluster, or organization), it becomes an operational and scaling bottleneck and a single point of failure. If the exporter process crashes or becomes overloaded by a service creating too many metrics, you will now lose metrics for all of your services. You may also run into conflicts between teams or services that want to update the exporter, or you may need to coordinate configuration changes to the exporter across many services. That in turn can make it harder for teams to iterate independently on their monitoring setups.

The Prometheus server can also scrape many small metrics endpoints much more efficiently than a single large one, as it can scrape them in parallel. A larger single scrape is also more likely to hit scrape timeouts or other limitations that can cause the entire scrape to fail.

Scraping metrics selectively is harder

Having a single large exporter makes it harder to scrape metrics selectively. For example, you may only want to scrape metrics for a subset of services that your team is responsible for (like the MySQL server metrics in the image above), while another team might use a different Prometheus server to pull a different set of metrics for their services. With a single exporter, you would always have to scrape all metrics from the exporter's endpoint and then rely on inefficient and cumbersome metrics relabeling rules during the scrape to filter out the metrics you don't care about.

Throwing away Prometheus' target health monitoring capabilities

When Prometheus scrapes a target, it records an up metric with the labels of the target, setting the value to 1 if the scrape succeeded or 0 if it failed for any reason:

Prometheus' target health monitoring

This up metric can tell you whether the target is reachable and at least healthy enough to expose working Prometheus metrics. That makes it a great way for creating basic target health alerts for everything you expect to exist in your infrastructure. However, if you squeeze all of your service metrics through a single large exporter scrape, you will only get a single up metric for the whole exporter, and you will lose all information about the individual service processes, and whether any of them are either down or even completely absent. If you care about this information, you will now need to get it from a different source, which won't be as integrated with Prometheus' service discovery and the resulting target labels.

Fundamentally, a monitoring system should know what things (processes, services, devices) should exist and then be able to check whether they are healthy and reporting in. Using a single large exporter makes this task significantly more difficult to achieve.

Throwing away Prometheus' service discovery and target label metadata capabilities

When using a single large exporter, you give up Prometheus' ability to attach rich metadata labels (originating from service discovery) to all metrics of an individual target. This metadata will now need to be contained in the metrics of the exporter itself and the processes that are pushing metrics to it, replicating functionality that Prometheus already provides in a more integrated way (for example, by providing a labeled up metric as described above).

The Prometheus way: Fine-grained exporters for each target

Instead of using a single large exporter that aggregates metrics from many different services, you will usually want to run a separate exporter for each process or device that you are monitoring:

Individual exporters for each target

In this fine-grained monitoring setup, Prometheus can now scrape each monitored target individually, giving you all the benefits of Prometheus' pull-based monitoring model:

  • No bottlenecks, operational independence: Each service team can run and maintain their own exporters, and they can iterate independently of other teams. If one exporter crashes or malfunctions, only the metrics for that service are affected, not all of them.
  • Selective scraping: You can now easily collect just the metrics that your team cares about, by selecting which targets to discover and scrape.
  • Integrated health monitoring: Prometheus can now monitor the health of each target individually, giving you a per-process up metric that tells you whether the service process is reachable and healthy enough to expose metrics. That in turn allows you to set up basic health alerts for all service processes that you expect to exist.
  • Service discovery and target metadata: Prometheus can use service discovery to find all exporters and attach metadata labels to the metrics originating from each of them that tell you where they came from.

Notable exceptions

There are two prominent exporters in the Prometheus ecosystem that can expose metrics for multiple targets using a single exporter process:

  • The Blackbox Exporter, which allows you to synthetically probe targets from the outside.
  • The SNMP Exporter, which allows you to fetch SNMP metrics from network devices.

However, these exporters are special: They do not group metrics from a set of targets into a single large scrape, but instead only produce metrics for a single backend target for each scrape. To achieve this, the Prometheus server sends a target HTTP parameter along with each scrape request to tell the exporter which target to produce metrics for. The target parameter will usually still be based on information from Prometheus' service discovery, allowing you to make full use of fine-grained scraping, per-target health monitoring, and metadata labeling. So while these exporters can be run as a single process for many backend targets, they still appear as many individual targets and scrapes from Prometheus' point of view, are stateless, and can be horizontally scaled at will. If you want to learn more about this model, take a look at our Probing Services - Blackbox Exporter training.

Conclusion

While it may seem convenient to aggregate metrics from many services into a single exporter, doing so throws away many of the benefits that Prometheus' pull-based monitoring model gives you. Instead, it's usually better to run a separate exporter for each process or device that you are monitoring. This way, you can avoid operational bottlenecks, scaling issues, and single points of failure, and you can take full advantage of Prometheus' target health monitoring, service discovery, and target label metadata capabilities.

If you want to understand more about Prometheus exporters and how to build them, take a look at our Understanding and Building Exporters training.


August 17, 2024 by Julius Volz

Tags: prometheus, exporters, multi-target