Unlock All-in-One Observability for APISIX with DeepFlow
2024-02-07With the growing emphasis on the observability of application components, Apache APISIX has introduced a plugin mechanism to enrich observability signals. However, these data are scattered across multiple stacks, creating data silos. This article aims to elucidate how to leverage DeepFlow’s zero-code feature based on eBPF to construct an observability solution for APISIX. On this basis, it integrates the rich data sources of existing APISIX plugins to eliminate data silos and build an all-in-one platform for comprehensive observability of the APISIX gateway. Through DeepFlow, APISIX can achieve comprehensive observability from traffic monitoring and tracing analysis to performance optimization, eliminating data dispersion and providing a centralized view. This accelerates fault diagnosis and performance tuning, making the work of DevOps and SRE teams more efficient. This article will focus on how APISIX’s tracing data, metric data, access logs, and performance profiling data can be integrated with DeepFlow.
Unlock All-in-One Observability for APISIX with DeepFlow
0x0: Install APISIX and DeepFlow
For convenience, this article describes deploying both DeepFlow and APISIX as Kubernetes services, with the entire deployment process taking approximately 5 minutes. For detailed deployment steps, refer to the DeepFlow and APISIX official deployment documentation.
Note: To leverage DeepFlow’s observability capabilities that utilize eBPF technology, your host’s Linux kernel must be version 4.14 or higher.
0x1: Distributed Tracing
There are two approaches to implementing distributed tracing for APISIX and backend services using DeepFlow: Firstly, DeepFlow leverages eBPF to enable out-of-the-box, RPC-level distributed tracing for APISIX and backend services, requiring no code changes. Secondly, if backend services have APM (Application Performance Monitoring) tools like OpenTelemetry or SkyWalking enabled, you can integrate all tracing data into DeepFlow using the APISIX Tracers plugin. This enables comprehensive, end-to-end tracing at the application function level.
Two Methods for Implementing Distributed Tracing of APISIX and Backend Services in DeepFlow
DeepFlow eBPF AutoTracing
DeepFlow offers out-of-the-box distributed tracing (AutoTracing) that requires no APISIX plugins or code changes to be enabled. It only necessitates deploying the deepflow-agent on the server where APISIX is located. In Grafana, find the Distributed Tracing Dashboard provided by DeepFlow, where you can initiate a trace on a specific request and see the end-to-end trace of that request in both APISIX and its backend services, as illustrated below:
- (1): Accesses the APISIX gateway service on the K8s Node NIC via nodeport.
- (2): Enters the NIC of the POD corresponding to the APISIX gateway service.
- (3): Goes into the OpenResty process within the APISIX gateway service.
- (4): The request is forwarded to the backend service by the OpenResty process.
- (5): Forwarded by the NIC of the POD corresponding to the APISIX gateway service.
- (6)/(7): Forwarded to the backend service.
DeepFlow eBPF AutoTracing
DeepFlow eBPF + OpenTelemetry
This approach involves APISIX generating trace data using the OpenTelemetry plugin, while the backend service also has APM capabilities and can convert generated trace data into the OpenTelemetry format. When APISIX and backend services both send trace data to DeepFlow, it can create a comprehensive trace-tree without any blind spots, incorporating APM application SPAN, eBPF system SPAN, and cBPF network SPAN.
This method is ideal for achieving function-level distributed tracing inside the application process or when the backend service uses a thread pool for call handling, which may disrupt DeepFlow AutoTracing.
1. Deploy Backend Services with APM Enabled
To demonstrate the full tracing effect, we first deploy a demo application behind the APISIX gateway that supports OpenTelemetry. The deployment of the Demo application can refer to: “DeepFlow Demo - One-click deployment of a WebShop application composed of five microservices written in Spring Boot“. Create a route on APISIX to access the backend service, with the access domain being apisix.deepflow.demo
.
1 | apiVersion: apisix.apache.org/v2 |
2. Enable the OpenTelemetry Plugin in APISIX
Add OpenTelemetry plugins to the APISIX configuration:
1 | ## vim ./apisix/.values.yaml |
Enable OpenTelemetry functionality for a specific route:
1 | ## View router id |
1 | ## Enable the otel feature for a specific route |
3. Using DeepFlow to Integrate OpenTelemetry Traces
The integration of OpenTelemetry Span data through DeepFlow Agent is enabled by default and requires no additional configuration.
1 | ## View the default configuration of deepflow-agent |
4. OpenTelemetry Integration Showcase
We initiate a command from the client to access the WebShop service:
1 | curl -H "Host: apisix.deepflow.demo" 10.1.23.200:44640/shop/full-test |
Open the Distributed Tracing Dashboard provided by DeepFlow in Grafana, find the corresponding request, and initiate tracing. You’ll be able to see traces from both APISIX and the backend services. Moreover, the application SPANs generated by APM and the network SPANs and system SPANs generated by DeepFlow are all comprehensively associated on one flame graph:
Note: In the flame graph, “A” represents the application SPAN generated by APM, while “N” and “S” represent the network SPAN and system SPAN generated by DeepFlow, respectively.
DeepFlow eBPF + OTel
0x2: Performance Metrics
DeepFlow offers immediate insights into metrics, featuring detailed RED (Rate, Error, Duration) performance metrics at the endpoint level, along with comprehensive TCP network performance metrics, including throughput, retransmissions, zero window, and connection anomalies. Metrics from APISIX, including HTTP status codes, bandwidth, connections, and latency, captured by Metrics-type plugins like Prometheus and node-status, can also be integrated into DeepFlow. This data, detailing both instance and route granularity, is viewable on the APISIX-provided Grafana Dashboard.
Collecting APISIX Performance Metrics Using DeepFlow
1. Out-of-the-Box eBPF Metrics
Once the deepflow-agent is deployed on the server hosting APISIX, it automatically gathers highly detailed application and network level metrics. This includes metrics such as request rates, response latencies, and error statuses for specific clients or endpoints, as well as TCP connection setup times, connection anomalies, and more. Detailed metrics can be found on the DeepFlow official website in the metrics section. By opening the Application - K8s Ingress Dashboard provided by DeepFlow in Grafana, you can view application layer performance metrics related to APISIX. Similarly, network-related metrics can be viewed in the Network - K8s Pod Dashboard.
DeepFlow eBPF Performance Metrics (Application)
DeepFlow eBPF Performance Metrics (Network)
2. Enable the Prometheus Plugin in APISIX
Add the Prometheus plugin to the APISIX configuration:
1 | ## vim ./apisix/.values.yaml |
Enable Prometheus plugin:
1 | ## Note: Since the otel feature has been enabled above, here we need to enable Prometheus on top of the otel functionality. |
3. Collect APISIX Metrics with Prometheus
Collecting APISIX metrics using Prometheus (example given using Prometheus CRD deployment method):
1 | ## Collecting via ServiceMonitor Method (Prometheus deployed in Kubernetes as a CRD) |
At this point, a Prometheus backend service is required to collect the metrics generated by APISIX plugins. Therefore, it is necessary to deploy a prometheus-server first. However, since these metrics do not rely on prometheus-server for storage, it is possible to deploy a prometheus-server in agent mode or use a lighter weight grafana-agent instead. Assuming that prometheus-server has been deployed, enabling RemoteWrite will send metric data to DeepFlow:
1 | ## https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write |
4. Integrating Prometheus Metrics with DeepFlow
Integrating Prometheus metrics through deepflow-agent is enabled by default and requires no additional configuration.
1 | ## View the default configuration of deepflow-agent |
5. Prometheus Integration Showcase
Since DeepFlow supports PromQL, you only need to change the data-source in the Grafana dashboard provided by APISIX to DeepFlow. This way, you can view the rich performance metrics natively provided by APISIX. For instructions on how to use these metrics, refer to the official documentation regarding the Prometheus plugin.
Displaying APISIX Dashboard via DeepFlow Data Source
0x3: Access Logs and Continuous Profiling
Collecting APISIX Access Logs and Continuous Profiling Using DeepFlow
For access logs, there is no need for APISIX to be modified in any way. Simply deploying deepflow-agent on the server where APISIX is located. By opening the Application - Request Log Dashboard provided by DeepFlow in Grafana, you can view the access logs, which include header information from both the Request and Response. Additionally, you can analyze the response latency and error codes for each request.
Access Log Dashboard Provided by DeepFlow
DeepFlow also utilizes eBPF to capture function call stacks of applications, a feature available in the enterprise edition. This functionality enables the generation of an On-CPU Profile for the APISIX process, detailing the function call stack. It encompasses not just application functions but also the time spent in libraries and kernel syscalls.
Continuous Profiling in DeepFlow Enterprise Edition
0x4: What is APISIX
Apache APISIX is a dynamic, real-time, high-performance open-source API gateway that provides rich traffic management functions such as load balancing, dynamic upstream, canary release, circuit breaking, authentication, and observability. Being built based on NGINX and LuaJIT, Apache APISIX has ultra-high performance with a single-core QPS of up to 23,000 and an average delay of only 0.2 milliseconds. It can solve problems in traditional architecture, and at the same time adapt to the needs of the cloud-native era.
As an API gateway, Apache APISIX has a wide range of application scenarios. It can be applied to scenarios such as gateways, Kubernetes Ingress Controller, and service mesh, and can help enterprises quickly and safely process API and microservice traffic. At present, it has been tested and highly recognized by worldwide enterprises and organizations such as Zoom, Airwallex, Lotus Cars, vivo, and European Factory Platform.
Open-sourced and donated by API7.ai to Apache Software Foundation in 2019, Apache APISIX is now the most active API gateway project on GitHub addressing 1 Trillion+ API calls per day, which is still growing.
GitHub address:https://github.com/apache/apisix
0x5: What is DeepFlow
DeepFlow, an open-source observability project, aims to deliver comprehensive observability for complex cloud infrastructures and cloud-native applications. Utilizing eBPF technology, it offers application performance metrics, distributed tracing, and continuous profiling with zero-code instrumentation, thanks to its integration of smart-encoding technology for full-stack correlation. DeepFlow enables automatic deep observability for cloud-native applications, easing developers’ workload and equipping DevOps/SRE teams with advanced monitoring and diagnostic tools that span from code to infrastructure.
GitHub address:https://github.com/deepflowio/deepflow