Trace Completion API
This document was translated by ChatGPT
#1. Introduction
APM focuses on the code level and lacks the ability to view issues from a full-stack, multi-dimensional perspective without blind spots. Additionally, due to the hindrance of instrumentation, it often fails to cover all services. DeepFlow relies on eBPF zero-instrumentation to fully capture distributed tracing data and generate call chains. In scenarios where DeepFlow and APM are deployed independently, they can collaborate in a loosely coupled manner by using DeepFlow's Trace Completion API to enhance APM's call chains, eliminating blind spots in APM for cloud-native infrastructure and non-instrumented services, significantly reducing the time for triage.
Before introducing the API, let's use a diagram to explain the data that APM can complete after calling the DeepFlow API.
Full Stack Distributed Tracing
- In the diagram, Spans starting with A represent application Spans (from APM); those starting with S represent system Spans (from DeepFlow); and those starting with N represent network Spans (from DeepFlow).
- The black parts in the diagram are the input parameters for APM calling the DeepFlow API. DeepFlow will use these
application Spans
as search boundaries to complete the surroundingsystem/network Spans
and reconstruct the Parent-Child relationships. - The blue parts in the diagram are
application Spans
injected with TraceID/SpanID in the protocol from APM, and thesystem/network Spans
calculated based on them. These complete the kernel system calls and network transmission paths such as Syscall, Bridge, and IPVS between two services for APM. - The green parts in the diagram are basic service calls automatically traced by DeepFlow's
system Spans
, such as non-instrumented DNS calls and MySQL calls, Redis calls, etc., where TraceID/SpanID cannot be injected. - The red parts in the diagram are upstream and downstream services automatically traced by DeepFlow's
system Spans
, such as non-instrumented ALB, NLB, Ingress gateway services, and other services in the business logic that APM has not instrumented.
#2. API Description
Get the DeepFlow service endpoint port number:
port=$(kubectl get --namespace deepflow -o jsonpath="{.spec.ports[0].nodePort}" services deepflow-app)
Trace Completion API call method:
curl -XPOST "http://${deepflow_server_node_ip}:${port}/v1/stats/querier/tracing-completion-by-external-app-spans"
#2.1 Input Parameters Description
{
"max_iteration": 30,
"network_delay_us": 3000000,
"app_spans":[
{
"trace_id": "xxxx",
"span_id": "xxxx",
"parent_span_id": "xxxx",
"span_kind": 0,
"start_time_us": 1681960139619998,
"end_time_us": 1681960139620004,
},
...
]
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Field | Type | Required | Description |
---|---|---|---|
max_iteration | int | No | Depth of system Span tracing, default is 30, unit: layers |
network_delay_us | int | No | Time span for network Span tracing, default is 3000000, unit: microseconds |
app_spans | array[AppSpans] | Yes | List of application Spans to complete the call chain, can be all application Spans in a complete Trace (not recommended) |
app_spans are usually part of the application Spans of a Trace in APM. DeepFlow completes based on this. It is recommended to carry the following Spans for each call:
- The most concerned application Span (hereinafter referred to as X), and the service it belongs to is called a
- The ancestor Spans of X, until the first ancestor Span that is not service a is found, for example, in SkyWalking, it is the first ancestor Span of type Exit
- The descendant Spans of X, each branch until the first descendant Span that is not service a is found, for example, in SkyWalking, it is the first descendant Span of type Entry for each branch
The purpose of carrying these Spans in the request is to inform DeepFlow to complete around Span X and reconstruct the parent-child relationships of all Spans in the returned result with the ancestors and descendants of X as boundaries. The specific parameters required for each app_span are as follows:
Field | Type | Required | Description |
---|---|---|---|
trace_id | string | Yes | TraceID of the application Span |
span_id | string | Yes | SpanID of the application Span |
parent_span_id | string | Yes | ParentSpanID of the application Span |
span_kind | int | Yes | Span type of the application Span , same meaning as in OpenTelemetry, optional values: 0: unspecified, 1: internal, 2: server, 3: client, 4: producer, 5: consumer |
start_time_us | int | Yes | Start time of the application Span , unit: microseconds |
end_time_us | int | Yes | End time of the application Span , unit: microseconds |
#2.2 Output Parameters Description
{
"OPT_STATUS": "SUCCESS",
"DESCRIPTION": "",
"DATA": {
"tracing": [
{
"start_time_us": 1682216627824419,
"end_time_us": 1681960139620004,
"name": "querier_client",
"signal_source": 4,
"tap_side": "c-app",
"trace_id": "a03a848c3121b817b0e866fb71607bc2",
"span_id": "d5b574eb7ac48503",
"parent_span_id": "69cc875250b4043c",
"deepflow_span_id": "d5b574eb7ac48503",
"deepflow_parent_span_id": "69cc875250b4043c",
"_ids": ["7225065397752915120"],
"related_ids": [
"2-app-7225065397752915115"
],
"flow_id": "0",
"duration": 32219,
"req_tcp_seq": 0,
"resp_tcp_seq": 0,
"l7_protocol": 20,
"l7_protocol_str": "HTTP",
"request_type": "POST",
"request_resource": "xxxx",
"response_status": 2,
"request_id": "xxxx",
"endpoint": "querier_client",
"process_id": 1234,
"app_service": "deepflow-statistics",
"app_instance": "",
"x_request_id": "",
"syscall_trace_id_request": "0",
"syscall_trace_id_response": "0",
"syscall_cap_seq_0": 0,
"syscall_cap_seq_1": 0,
"vtap_id": 1,
},
...
]
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
The tracing in the returned result is the complete Spans traced by DeepFlow, which is an array. Each item in the array is a Span, including both application Spans from APM and system/network Spans from DeepFlow. Important attributes of each Span are:
Field | Type | Description |
---|---|---|
start_time_us | int | Start time of the Span, unit: microseconds |
end_time_us | int | End time of the Span, unit: microseconds |
duration | int | Execution time of the Span, unit: microseconds |
name | string | Name of the Span, system/network Spans correspond to DeepFlow's request_resource field description |
signal_source | int | Source of the Span, corresponding to DeepFlow's signal_source field description |
tap_side | int | Span statistics location, corresponding to DeepFlow's tap_side field description |
trace_id | string | TraceID, if system/network Span has a corresponding application Span , it is the value of the corresponding application Span ; otherwise, the value is empty |
span_id | string | Original Span ID, if system/network Span has a corresponding application Span , it is the value of the corresponding application Span ; otherwise, the value is empty |
parent_span_id | string | Original parent Span ID, if system/network Span has a corresponding application Span , it is the value of the corresponding application Span ; otherwise, the value is empty |
deepflow_span_id | string | Span ID recalculated by DeepFlow |
deepflow_parent_span_id | string | Parent Span ID recalculated by DeepFlow |
In addition, the API will return some extra fields for each Span:
Field | Type | Description | Remarks |
---|---|---|---|
_ids | array | DeepFlow call logs corresponding to the Span | |
related_ids | int | Other DeepFlow call logs related to the Span | |
flow_id | string | DeepFlow flow logs corresponding to the Span, no data for application/system Spans | |
l7_protocol | int | Application protocol of the Span, corresponding to DeepFlow's l7_protocol field description | |
l7_protocol_str | string | Application protocol of the Span | |
request_type | string | Request type of the Span | |
request_id | string | Request ID of the Span | |
endpoint | string | Request endpoint of the Span | |
request_resource | string | Request resource of the Span | |
response_status | int | Response status of the Span, corresponding to DeepFlow's response_status field description | |
process_id | int | Process ID to which the Span belongs, only system Spans have data | |
app_service | string | Service to which the Span belongs, only application Spans have data | |
app_instance | string | Instance to which the Span belongs, only application Spans have data | |
vtap_id | int | Collector ID corresponding to the Span | |
req_tcp_seq | int | TCP Seq corresponding to the Span request, only system/network Spans have data | Used for tracing calculation |
resp_tcp_seq | int | TCP Seq corresponding to the Span response, only system/network Spans have data | Used for tracing calculation |
x_request_id | string | X-Request-ID of the Span request or response, only system/network Spans have data | Used for tracing calculation |
syscall_trace_id_request | string | Syscall TraceID corresponding to the Span request, only system Spans have data | Used for tracing calculation |
syscall_trace_id_response | string | Syscall TraceID corresponding to the Span response, only system Spans have data | Used for tracing calculation |
syscall_cap_seq_0 | string | Syscall Seq corresponding to the Span request, only system Spans have data | Used for tracing calculation |
syscall_cap_seq_1 | string | Syscall Seq corresponding to the Span response, only system Spans have data | Used for tracing calculation |
Note:
- The new parent-child relationships of Spans in the returned result need to be constructed using the
deepflow_span_id
anddeepflow_parent_span_id
fields. - TraceID/SpanID injected into the protocol after application instrumentation can be automatically parsed and collected by the Agent. By default, it is adapted to the Header format of OpenTelemetry and SkyWalking. If there are custom Headers, please modify the Agent configuration. For details, refer to Agent Advanced Configuration.