Integration
Output
Query API

PromQL API

Created：2024-11-05 Last Modified：2024-11-05

This document was translated by ChatGPT

#1. Introduction

Starting from v6.2.1, DeepFlow supports PromQL. The following Prometheus APIs are currently implemented and can be called directly via HTTP as per the Prometheus API definition (opens new window):

Http Method	Path	Prometheus API	Description
GET/POST	/prom/api/v1/query	/api/v1/query	Query data at a single point in time
GET/POST	/prom/api/v1/query_range	/api/v1/query_range	Query data over a time range
GET	/prom/api/v1/label/:labelName/values	/api/v1/label/<label_name>/values	Get all labels for a metric
GET/POST	/prom/api/v1/series	/api/v1/series	Get all time series

#1.1 Calling Method

You can call the API in DeepFlow as follows:

Get the server endpoint port number:

port=$(kubectl get --namespace deepflow -o jsonpath="{.spec.ports[0].nodePort}" services deepflow-server)

Example of API call:

Instant Query:

time=$((`date +%s`))

curl -XPOST "http://${deepflow_server_node_ip}:${port}/prom/api/v1/query" \
--data-urlencode "query=sum(flow_log__l7_flow_log__server_error) by(request_resource, response_code)" \
--data-urlencode "time=${time}"

1
2
3
4
5

Range Query:

end=$((`date +%s`))
start=$((end-600))

curl -XPOST "http://${deepflow_server_node_ip}:${port}/prom/api/v1/query_range" \
--data-urlencode "query=sum(flow_log__l7_flow_log__server_error) by(request_resource, response_code)" \
--data-urlencode "start=${start}" \
--data-urlencode "end=${end}" \
--data-urlencode "step=60s"

1
2
3
4
5
6
7
8

#1.2 DeepFlow Metric Definitions

When providing PromQL queries externally, DeepFlow metrics are constructed in the format ${database}__${table}__${metric}__${data_precision}. You can obtain the target data source to query through the definition of AutoMetrics Metric Types. The specific rules are as follows:

db	metrics
`flow_log`	`{db}__{table}__{metric}`
`flow_metrics` (data_precision values are `1m`/`1s`)	`{db}__{table}__{metric}__{data_precision}`
`prometheus` (data written via Prometheus RemoteWrite)	`prometheus__samples__{metric}`

For example:

flow_metrics__application__request__1m: Represents the number of application layer requests aggregated per minute
flow_metrics__network__tcp_timeout__1s: Represents the number of network layer TCP timeouts aggregated per second
flow_log__l7_flow_log__error: Represents the number of application layer errors

#1.3 Known Limitations

In the Grafana panel operations, the following limitations are currently known:

Labels cannot be queried directly; you need to select metrics first before choosing labels
When reselecting metrics, all labels need to be removed first
Queries will fail if the metrics name contains characters like .- that are not supported by Prometheus

When querying directly based on PromQL or writing alert rules, the following limitations are currently known:

Metrics names cannot be searched using ~/!~ regex
For metrics provided by DeepFlow, you must first determine the aggregation evaluation method through the aggregation operator (opens new window) before performing specific metric queries. The functions stdvar, topk, bottomk, and quantile are not yet supported and will be supported in future iterations.

#2. Querying DeepFlow Metrics Based on PromQL

Based on the above definitions, we can query DeepFlow metrics using PromQL. Note to determine the aggregation evaluation method first, for example:

Query the time series trend of all requests with HTTP 500 response codes:

sum(flow_log__l7_flow_log__server_error{response_code="500"}) by(request_resource, response_code)

If we want to see the time series trend for each specific service, add service grouping:

sum(flow_log__l7_flow_log__server_error{response_code="500"}) by(auto_service_1, request_resource, response_code)

Query the trend of TCP connection delay changes over the past 5 minutes with a 10s evaluation interval, grouped by service:

rate(sum(flow_metrics__network_map__rtt__1s)by(auto_service_1)[5m:10s])

Query the trend of average application delay over the past 10 minutes with a 1m evaluation interval, grouped by service:

avg_over_time(avg(flow_metrics__application__rrt__1m)by(auto_service)[10m:1m])

#3. Implementing Prometheus Alerts Based on DeepFlow Metrics

With the above examples, after configuring Prometheus RemoteRead, you can build alert rules on Prometheus based on these metrics, such as:

Alert for requests with latency > 1s and lasting more than 1m:

groups:
  - name: requestMonitoring
    rules:
      - alert: requestDelayAlert
        expr: avg(flow_metrics__application__rrt__1m)by(l7_protocol, auto_service, auto_instance) / 10^6 > 1
        for: 1m
        annotations:
          summary: 'High Request Latency'
          description: '{{ $labels.auto_instance }} request to {{ $labels.auto_service }} has a high request latency above 1s (current value: {{ $value }}s)'

1
2
3
4
5
6
7
8
9

We will also support direct configuration of alert rules in future iterations of DeepFlow, so stay tuned.