v6.2 EE Release Notes
Created:2024-11-05 Last Modified:2024-11-05
This document was translated by ChatGPT
#1. Applications, Network, Metrics, Events
- AutoTracing
- Support zero-instrumentation distributed tracing for Golang applications
- Add file IO event performance data and associate it with call logs
- Universal Service Map
- Support zero-instrumentation automatic display of process-level universal application topology
- Support calculating the real access relationships before and after NAT using the TOA (TCP Option Address) mechanism
- Pre-aggregate OpenTelemetry Span data into service and path metrics
- Correct path data direction using the operating system's socket information
- Add
direction score
metric; the higher the score, the more accurate the client and server direction. A score of 255 indicates the direction is definitely correct. - Support displaying application topology grouped by Custom Tag (e.g., k8s.label.xxx)
- Private protocol parsing, business field parsing
- Introduce WebAssembly Plugin mechanism to support adding private protocol parsing and enhancing standard protocol field parsing capabilities, see the specific documentation here
- New continuous profiling feature
- Support collection and display of Java and Golang Native Profiles
- Network NAT tracing, flow logs, traffic download
- No rate limit for flow logs associated with PCAP policies
- Support viewing flow logs for specified PCAP policies
- Set the status of non-TCP flow logs to normal
- Reduce the minimum time span for PCAP downloads to 1s
- Support direct traffic download based on PCAP policies
- Cloud network traffic distribution
- Support traffic distribution using TCP tunnels
- Display the size of traffic matched by distribution policies separately by distribution point
- Support pre-configuring workloads and container services not yet created in distribution policies
- Usability improvements
- Data association
- Support switching grouping conditions in the right slide-out frame metrics tab for easier drill-down
- Page operations
- Expand the field types supported by template variables
- Add quick filter functionality on the left side of table pages
- Support search filtering in the advanced configuration dropdown for metrics
- Improve the usability of the table
column selection
- Support direct PCAP file download from the TCP sequence diagram page
- Support configuring metric templates for quick selection of common metric sets in Panels
- Enhance control over the display of tips in Panels
- Support quick copy of tags in the right slide-out page and flow log call log detail page, and support quick filtering
- Optimize the filtering logic in the traffic relationship tab of the right slide-out frame to avoid ambiguity
- Support setting historical search conditions as the default content loaded on the page
- Change the full-screen display of all Panels to pop-up display
- Support directly modifying path options in the search box
- Optimize the operation and display of the distributed tracing flame graph, and support filtering to display APP/SYS/NET Span
- New pages
- Add a topology view to distributed tracing
- Add a
Service List
page to summarize service-level metric data from OTel, eBPF, and Packet signal sources - Add an
Endpoint
monitoring page, supporting association expansion from the service list page
- Streamlined pages
- Merge path, flow log/call log pages
- Page display
- Optimize the display of resources corresponding to the network path collection NIC; show as uncollected when no corresponding resource is found
- Optimize the display of knowledge graph information in the right slide-out page
- Use waterfall topology for NAT tracing page and add physical topology display
- Optimize the display of network paths in the right slide-out page, supporting default display of physical network paths
- Do not display empty value tags on flow log, call log, and other pages
- Support adding description information for historical search conditions
- Optimize the text display of search box fields, providing more helpful information
- Display the values and description information of enum fields in data tables
- Preemptively gray out the jump button when the target page has no data
- Adjust and optimize page layout for more reasonable space utilization and increased information display
- Use auto_instance, auto_service to replace the hard-to-understand resource_gl0/gl1/gl2
- Data association
#2. Dashboards, Alerts, Reports
- Optimize the display of
trigger values
in alert events, showing the current value instead of the trigger value - Review and refine the wording of all system alerts
- Packet loss environment threshold alert policies for all collectors and data nodes
#3. Resources
- AutoTagging
- Support synchronizing process information on container nodes and cloud servers, and inject it as Universal Tag into all data, supporting filtering and grouping
- Support adding custom metadata for processes, cloud servers, and K8s Namespaces
- Support automatic synchronization of K8s cluster information under AWS and Alibaba Cloud accounts
- Support synchronizing Baidu Cloud Cloud Smart Network (CSN) resource information
- Add support for K8s Service in the Custom Tag field k8s.label
- When a K8s Pod serves as the backend for multiple K8s Services, associate only the service with the smallest dictionary order in all tags
- Usability
- Support batch input of load balancer and its listener information
- Support batch input of Gateway type host and its NIC
- Support unified setting of
additional docking route interfaces
for all managed K8s clusters under a public cloud account
- Compatibility
- Adapt resource information synchronization for K8s 1.18, 1.20
- Add two configuration items for Huawei public cloud: domain name and IAM authorization address, to adapt to HCSO scenarios
#4. System
- API
- Querier adds support for PromQL
- Support querying DeepFlow native metrics via PromQL
- Support querying Prometheus RemoteWrite metrics via PromQL
- When querying Prometheus native metrics, support using tags automatically injected by DeepFlow AutoTagging
- Ingester adds support for OTLP Export
- Support exporting SYS/NET Span data (l7_flow_log) to otel-collector, see the documentation for output fields here, FR-014-Tencent (opens new window).
- Querier SQL API
- Provide Trace Completion API
tracing-completion-by-external-app-spans
: the caller passes in a set of APP Spans (no need to store them in DeepFlow) to get their upstream and downstream SYS/NET Spans, enhancing traditional APM with infrastructure and non-instrumented service tracing capabilities, see the specific documentation here - Add support for the AS keyword and return the original field name before AS in the results
- When getting optional values for enum type Tag fields, return the description information corresponding to the values
- Add SLIMIT parameter to limit the number of Series in the returned results
- Unify the Category of custom type Tags (k8s.label/cloud.tag/os.app) as map_item
- Add three Universal Tags: tap_port_host, tap_port_chost, tap_port_pod_node, representing the host, cloud server, and container node to which the collection NIC belongs, respectively
- Provide Trace Completion API
- Querier adds support for PromQL
- Grafana
- Prometheus Dashboard can use DeepFlow as a data source without modification
- In the Query Editor, support referencing the
$__interval
variable for the Math operator - Optimize Enum type Variable to avoid expanding all candidate values in SQL when selecting All
- Add Grafana backend plugin module to support standard Grafana alert policy configuration
- Support referencing another Variable when defining a Variable
- Display the corresponding SQL statement in real-time when editing Query Editor content
- eBPF
- When the application process only acts as a client, avoid eBPF associating all requests to the same Trace
- Simplify protocol recognition logic in eBPF code
- Compatibility
- Support running the collector in Linux Kernel environments below 3.0
- Support specifying the Hostname of the environment where the collector is located, suitable for environments with hostname conflicts
- Provide two types of deepflow-agent binary packages: dynamic linking and static linking; the former depends on the glibc dynamic link library, while the latter has significant malloc/free lock contention under multithreading
- eBPF programs support automatic adaptation of kernel offsets using BTF files
- Controller and data node management
- Support ClickHouse nodes greater than the number of deepflow-server replicas
- Support specifying domain name form of controller or ingester address for deepflow-agent
- Support configuring data storage duration in hourly granularity
- Change the configuration API for data duration to asynchronous to avoid timeout
- Add self-monitoring for the report service
- Collector management
- Support specifying Group ID when creating collector groups (agent-group)
- Support running deepflow-agent as a regular process (instead of a Pod) on K8s Node
- When the ctrl_ip or ctrl_mac of the agent's operating environment changes, support automatically updating the agent's corresponding information to avoid disconnection
- Remote upgrade of deepflow-agent on cloud servers can be fully completed through deepflow-ctl, without manually mounting hostPath for deepflow-server
- Support configuring the time interval for Agent list k8s resources to reduce pressure on apiserver
- When only network monitoring authorization is available, add support for viewing HTTP and DNS under the application menu
- Automatically delete Agents that have been disconnected for more than vtap_auto_delete_interval, with a default deletion time of more than 1 hour of disconnection
- Debug
- Lower the collection interval of all DeepFlow self-monitoring metrics to 10s
- Embed eBPF data source information (syscall, go-tls, go-http2, openssl, io-event) in the tap_port field to enhance debugging capabilities
- deepflow-server supports monitoring itself through continuous profiling
- A series of performance optimizations for various software components