Tweet

Ivan Velichko

10 May, 16 tweets, 3 min read

Prometheus Data & Query Model (thread)

Basics first, bear with me.

Everything starts from a METRIC.

Metric is a certain measurement of a system one wants to track. Metric names are identifiers.

Ex: http_requests_total, gc_duration_seconds, etc.

Every metric is measured at a certain TIME and has a certain VALUE.

Time is always with milliseconds precision.

Value is always float64, even if it looks like an integer.

Metric value, aka sample, is a pair (value, timestamp).

A metric can be LABEL-ed to allow more fine-grained control over measurements.

Label names are identifiers too (in a common programming sense).

Label values are always strings.

Ex: http_requests_total{method="GET", status="200"}

Last but not least in the data model...

TIME SERIES - a series of (value, timestamp) samples attributed to a certain metric and label set.

Ex:

- series A: requests{method="GET"}
- series B: requests{method="PUT"}
- series C: queries{type="SELECT"}

Prometheus docs mention also different metric types:

- counter
- gauge
- histogram
- summary

But the metric type doesn't really matter - in the end, everything boils down to series of (value, timestamp) tuples.

Query model begins...

Prometheus introduces the concept of a vector. W/o really saying much why is it called so.

Well, in programming, "vector" and "array" are synonyms. I.e. it's a finite sequence of homogeneous elements.

So, what are those vectors in Prometheus?

Since we are dealing with a TSDB, my first thought was that a vector is a bunch of samples corresponding to a certain time range.

But it's not...

Let's take a closer look at PromQL.

The simplest possible PromQL query consists just of a metric name.

But!

Metric != Series

There is usually a bunch of series behind a single metric name. As many as there are unique label sets.

So a query like `http_requests_total` would return as many samples as there are unique time series sharing the "http_requests_total" name.

And it's a vector! Or, more precisely, an INSTANT vector.

Did you notice that we haven't mentioned any time ranges here?

Each element of an instant vector:

- belongs to a different time series
- shares the same timestamp as all other elements.

But how to specify that timestamp?

PromQL doesn't allow to specify a timestamp for an instant vector.

A timestamp is specified separately! For instance, in the API request.

Of it defaults to `now()`.

So, how to plot a bunch of time series on a graph if you can select only an instant vector and not a piece of a time series?

You need to send a range query!

A range query consists of a:

- instant vector selector
- start time
- end time
- time step

That's where the official docs start to suck. They avoid explaining the idea of instant and range queries.

Instant query with an instant vector selector returns a single vector.

Range query with the same vector selector returns a [timestamped] vector of [instant] vectors!

@PromLabs

Official Prometheus docs are so focused on instant and range vectors that they completely forgot to explain instant and range query concepts.

I had to google for quite some time until I finally found these two @PromLabs gems:

- promlabs.com/blog/2020/07/0…
- promlabs.com/blog/2020/06/1…

Last but not least - range vectors.

A range vector is a PromQL concept.

Ex: `http_requests_total{method="GET"}[5m]`

A range vector is like an instant vector where every value is replaced with a series of values from the specified time bucket.

I.e. it's a matrix!

The following are valid constructs:

- instant query with an instant vector selector
- range query with an instant vector selector
- instant query with a range vector selector

A range query with a range vector selector would be a 3D construct... It's simply not a thing in p8s.

• • •

Missing some Tweet in this thread? You can try to force a refresh