PromQL API
This document was translated by ChatGPT
#1. Introduction
Starting from v6.2.1, DeepFlow supports PromQL. The following Prometheus APIs are currently implemented and can be called directly via HTTP as per the Prometheus API definition (opens new window):
Http Method | Path | Prometheus API | Description |
---|---|---|---|
GET/POST | /prom/api/v1/query | /api/v1/query | Query data at a single point in time |
GET/POST | /prom/api/v1/query_range | /api/v1/query_range | Query data over a time range |
GET | /prom/api/v1/label/:labelName/values | /api/v1/label/<label_name>/values | Get all labels for a metric |
GET/POST | /prom/api/v1/series | /api/v1/series | Get all time series |
#1.1 Calling Method
You can call the API in DeepFlow as follows:
Get the server endpoint port number:
port=$(kubectl get --namespace deepflow -o jsonpath="{.spec.ports[0].nodePort}" services deepflow-server)
Example of API call:
Instant Query:
time=$((`date +%s`))
curl -XPOST "http://${deepflow_server_node_ip}:${port}/prom/api/v1/query" \
--data-urlencode "query=sum(flow_log__l7_flow_log__server_error) by(request_resource, response_code)" \
--data-urlencode "time=${time}"
2
3
4
5
Range Query:
end=$((`date +%s`))
start=$((end-600))
curl -XPOST "http://${deepflow_server_node_ip}:${port}/prom/api/v1/query_range" \
--data-urlencode "query=sum(flow_log__l7_flow_log__server_error) by(request_resource, response_code)" \
--data-urlencode "start=${start}" \
--data-urlencode "end=${end}" \
--data-urlencode "step=60s"
2
3
4
5
6
7
8
#1.2 DeepFlow Metric Definitions
When providing PromQL queries externally, DeepFlow metrics are constructed in the format ${database}__${table}__${metric}__${data_precision}
. You can obtain the target data source to query through the definition of AutoMetrics Metric Types. The specific rules are as follows:
db | metrics |
---|---|
flow_log | {db}__{table}__{metric} |
flow_metrics (data_precision values are 1m /1s ) | {db}__{table}__{metric}__{data_precision} |
prometheus (data written via Prometheus RemoteWrite) | prometheus__samples__{metric} |
For example:
flow_metrics__application__request__1m
: Represents the number of application layer requests aggregated per minuteflow_metrics__network__tcp_timeout__1s
: Represents the number of network layer TCP timeouts aggregated per secondflow_log__l7_flow_log__error
: Represents the number of application layer errors
#1.3 Known Limitations
In the Grafana panel operations, the following limitations are currently known:
- Labels cannot be queried directly; you need to select metrics first before choosing labels
- When reselecting metrics, all labels need to be removed first
- Queries will fail if the metrics name contains characters like
.-
that are not supported by Prometheus
When querying directly based on PromQL or writing alert rules, the following limitations are currently known:
- Metrics names cannot be searched using
~/!~
regex - For metrics provided by DeepFlow, you must first determine the aggregation evaluation method through the aggregation operator (opens new window) before performing specific metric queries. The functions
stdvar
,topk
,bottomk
, andquantile
are not yet supported and will be supported in future iterations.
#2. Querying DeepFlow Metrics Based on PromQL
Based on the above definitions, we can query DeepFlow metrics using PromQL. Note to determine the aggregation evaluation method first
, for example:
- Query the time series trend of all requests with HTTP
500
response codes:
sum(flow_log__l7_flow_log__server_error{response_code="500"}) by(request_resource, response_code)
- If we want to see the time series trend for each specific service, add service grouping:
sum(flow_log__l7_flow_log__server_error{response_code="500"}) by(auto_service_1, request_resource, response_code)
- Query the trend of TCP connection delay changes over the past 5 minutes with a 10s evaluation interval, grouped by service:
rate(sum(flow_metrics__network_map__rtt__1s)by(auto_service_1)[5m:10s])
- Query the trend of average application delay over the past 10 minutes with a 1m evaluation interval, grouped by service:
avg_over_time(avg(flow_metrics__application__rrt__1m)by(auto_service)[10m:1m])
#3. Implementing Prometheus Alerts Based on DeepFlow Metrics
With the above examples, after configuring Prometheus RemoteRead, you can build alert rules on Prometheus based on these metrics, such as:
- Alert for requests with latency > 1s and lasting more than 1m:
groups:
- name: requestMonitoring
rules:
- alert: requestDelayAlert
expr: avg(flow_metrics__application__rrt__1m)by(l7_protocol, auto_service, auto_instance) / 10^6 > 1
for: 1m
annotations:
summary: 'High Request Latency'
description: '{{ $labels.auto_instance }} request to {{ $labels.auto_service }} has a high request latency above 1s (current value: {{ $value }}s)'
2
3
4
5
6
7
8
9
We will also support direct configuration of alert rules in future iterations of DeepFlow, so stay tuned.