OpenCensus is the first, open-source, vendor-agnostic solution to offer intelligent (tail-based) sampling. In this post, I would like demonstrate why intelligent sampling is so powerful and how you can configure it in OpenCensus. Read on to learn more!
What is Intelligent Sampling?
Intelligent Sampling, also known as tail-based sampling, is a technique used in distributed tracing to limit the collection of traces to those that are more relevant. Tracing data is notoriously verbose given that it tracks every request within an application. For high volume applications this results in high volume traces. Some of these traces, such as traces that contain errors, are important while others, like every HTTP status code 200 with very similar latency, are not.
To address the verbosity of tracing data, head-based sampling was introduced. With head-based, the sampling decision is made when the request begins. While this is easy to implement it has significant limitations. The biggest is that information about downstream spans (e.g. latency, errors, etc) is unknown as those calls have not been made yet. This means head-based sampling is good at reducing verbosity but bad at ensuring relevant traces are kept.
An alternatively approach is to do tail-based sampling. With tail-based sampling the sampling decision is made only after the entire trace has been collected. As a result, with tail-based sampling it is possible to capture relevant traces while minimally or never sampling less relevant traces. While tail-based sampling is desirable it introduces quite a bit of complexity. For example, all spans for a given trace need to be processed by the same system and the trace cannot be ingested by a backend until the entire trace is collected and a sampling decision is made (it is very hard to know when a trace is complete).
How do you configure Intelligent Sampling in OpenCensus?
OpenCensus offers intelligent sampling in the OpenCensus Collector (since the sampling decision needs to made with the entire trace it cannot be done in the OpenCensus Agent). Today, a single sampling policy can be applied to each exporter. The following sampling policies are supported:
- rate limiting: the maximum number of spans per second to export
- string tag filter: traces with the specified key/string-value tags are exported
- numeric tag filter: traces with the specified key/numeric-value tags are exported
- always sample: send all traces as complete traces
Configuration is done via the
sampling configuration section. For example:
sampling: mode: tail # amount of time from seeing the first span in a trace until making the sampling decision decision-wait: 10s # maximum number of traces kept in the memory num-traces: 10000 policies: # user-defined policy name my-string-tag-filter: # exporters the policy applies to exporters: - jaeger policy: string-tag-filter configuration: tag: tag1 values: - value1 - value2 my-numeric-tag-filter: exporters: - zipkin policy: numeric-tag-filter configuration: tag: tag1 min-value: 0 max-value: 100
In addition to the policies it is important to note the
decision-waitconfiguration parameter. This parameter specifies how long to wait before applying the sampling policy. If you know you have traces that take longer than ten seconds to complete then you should change this configuration.
Given intelligent sampling requires all spans for a given trace to arrive at the same Collector you must either use a single collector or leverage an external load balancing technique that does
With the OpenCensus Collector you can configure Intelligent Sampling of your distributed tracing data. It supports a variety of different policies today and has an extensible backend making it possible to easily add more policies as desired. Today, it supports a single policy per exporter and does not address traceID-based routing, but this is something planned for the future.
© 2019, Steve Flanders. All rights reserved.