v6.2 EE Release Notes

Created:2024-07-05 Last Modified:2024-07-05

This document was translated by ChatGPT

#1. Applications, Network, Metrics, Events

  • AutoTracing
    • Support zero-instrumentation distributed tracing for Golang applications
    • Add file IO event performance data and associate it with call logs
  • Universal Service Map
    • Support zero-instrumentation automatic display of process-level universal application topology
    • Support calculating the real access relationships before and after NAT using the TOA (TCP Option Address) mechanism
    • Pre-aggregate OpenTelemetry Span data into service and path metrics
    • Correct path data direction using the operating system's socket information
    • Add direction score metric; the higher the score, the more accurate the client and server direction. A score of 255 indicates the direction is definitely correct.
    • Support displaying application topology grouped by Custom Tag (e.g., k8s.label.xxx)
  • Private protocol parsing, business field parsing
    • Introduce WebAssembly Plugin mechanism to support adding private protocol parsing and enhancing standard protocol field parsing capabilities, see the specific documentation here
  • New continuous profiling feature
    • Support collection and display of Java and Golang Native Profiles
  • Network NAT tracing, flow logs, traffic download
    • No rate limit for flow logs associated with PCAP policies
    • Support viewing flow logs for specified PCAP policies
    • Set the status of non-TCP flow logs to normal
    • Reduce the minimum time span for PCAP downloads to 1s
    • Support direct traffic download based on PCAP policies
  • Cloud network traffic distribution
    • Support traffic distribution using TCP tunnels
    • Display the size of traffic matched by distribution policies separately by distribution point
    • Support pre-configuring workloads and container services not yet created in distribution policies
  • Usability improvements
    • Data association
      • Support switching grouping conditions in the right slide-out frame metrics tab for easier drill-down
    • Page operations
      • Expand the field types supported by template variables
      • Add quick filter functionality on the left side of table pages
      • Support search filtering in the advanced configuration dropdown for metrics
      • Improve the usability of the table column selection
      • Support direct PCAP file download from the TCP sequence diagram page
      • Support configuring metric templates for quick selection of common metric sets in Panels
      • Enhance control over the display of tips in Panels
      • Support quick copy of tags in the right slide-out page and flow log call log detail page, and support quick filtering
      • Optimize the filtering logic in the traffic relationship tab of the right slide-out frame to avoid ambiguity
      • Support setting historical search conditions as the default content loaded on the page
      • Change the full-screen display of all Panels to pop-up display
      • Support directly modifying path options in the search box
      • Optimize the operation and display of the distributed tracing flame graph, and support filtering to display APP/SYS/NET Span
    • New pages
      • Add a topology view to distributed tracing
      • Add a Service List page to summarize service-level metric data from OTel, eBPF, and Packet signal sources
      • Add an Endpoint monitoring page, supporting association expansion from the service list page
    • Streamlined pages
      • Merge path, flow log/call log pages
    • Page display
      • Optimize the display of resources corresponding to the network path collection NIC; show as uncollected when no corresponding resource is found
      • Optimize the display of knowledge graph information in the right slide-out page
      • Use waterfall topology for NAT tracing page and add physical topology display
      • Optimize the display of network paths in the right slide-out page, supporting default display of physical network paths
      • Do not display empty value tags on flow log, call log, and other pages
      • Support adding description information for historical search conditions
      • Optimize the text display of search box fields, providing more helpful information
      • Display the values and description information of enum fields in data tables
      • Preemptively gray out the jump button when the target page has no data
      • Adjust and optimize page layout for more reasonable space utilization and increased information display
      • Use auto_instance, auto_service to replace the hard-to-understand resource_gl0/gl1/gl2

#2. Dashboards, Alerts, Reports

  • Optimize the display of trigger values in alert events, showing the current value instead of the trigger value
  • Review and refine the wording of all system alerts
  • Packet loss environment threshold alert policies for all collectors and data nodes

#3. Resources

  • AutoTagging
    • Support synchronizing process information on container nodes and cloud servers, and inject it as Universal Tag into all data, supporting filtering and grouping
    • Support adding custom metadata for processes, cloud servers, and K8s Namespaces
    • Support automatic synchronization of K8s cluster information under AWS and Alibaba Cloud accounts
    • Support synchronizing Baidu Cloud Cloud Smart Network (CSN) resource information
    • Add support for K8s Service in the Custom Tag field k8s.label
    • When a K8s Pod serves as the backend for multiple K8s Services, associate only the service with the smallest dictionary order in all tags
  • Usability
    • Support batch input of load balancer and its listener information
    • Support batch input of Gateway type host and its NIC
    • Support unified setting of additional docking route interfaces for all managed K8s clusters under a public cloud account
  • Compatibility
    • Adapt resource information synchronization for K8s 1.18, 1.20
    • Add two configuration items for Huawei public cloud: domain name and IAM authorization address, to adapt to HCSO scenarios

#4. System

  • API
    • Querier adds support for PromQL
      • Support querying DeepFlow native metrics via PromQL
      • Support querying Prometheus RemoteWrite metrics via PromQL
      • When querying Prometheus native metrics, support using tags automatically injected by DeepFlow AutoTagging
    • Ingester adds support for OTLP Export
    • Querier SQL API
      • Provide Trace Completion API tracing-completion-by-external-app-spans: the caller passes in a set of APP Spans (no need to store them in DeepFlow) to get their upstream and downstream SYS/NET Spans, enhancing traditional APM with infrastructure and non-instrumented service tracing capabilities, see the specific documentation here
      • Add support for the AS keyword and return the original field name before AS in the results
      • When getting optional values for enum type Tag fields, return the description information corresponding to the values
      • Add SLIMIT parameter to limit the number of Series in the returned results
      • Unify the Category of custom type Tags (k8s.label/cloud.tag/os.app) as map_item
      • Add three Universal Tags: tap_port_host, tap_port_chost, tap_port_pod_node, representing the host, cloud server, and container node to which the collection NIC belongs, respectively
  • Grafana
    • Prometheus Dashboard can use DeepFlow as a data source without modification
    • In the Query Editor, support referencing the $__interval variable for the Math operator
    • Optimize Enum type Variable to avoid expanding all candidate values in SQL when selecting All
    • Add Grafana backend plugin module to support standard Grafana alert policy configuration
    • Support referencing another Variable when defining a Variable
    • Display the corresponding SQL statement in real-time when editing Query Editor content
  • eBPF
    • When the application process only acts as a client, avoid eBPF associating all requests to the same Trace
    • Simplify protocol recognition logic in eBPF code
  • Compatibility
    • Support running the collector in Linux Kernel environments below 3.0
    • Support specifying the Hostname of the environment where the collector is located, suitable for environments with hostname conflicts
    • Provide two types of deepflow-agent binary packages: dynamic linking and static linking; the former depends on the glibc dynamic link library, while the latter has significant malloc/free lock contention under multithreading
    • eBPF programs support automatic adaptation of kernel offsets using BTF files
  • Controller and data node management
    • Support ClickHouse nodes greater than the number of deepflow-server replicas
    • Support specifying domain name form of controller or ingester address for deepflow-agent
    • Support configuring data storage duration in hourly granularity
    • Change the configuration API for data duration to asynchronous to avoid timeout
    • Add self-monitoring for the report service
  • Collector management
    • Support specifying Group ID when creating collector groups (agent-group)
    • Support running deepflow-agent as a regular process (instead of a Pod) on K8s Node
    • When the ctrl_ip or ctrl_mac of the agent's operating environment changes, support automatically updating the agent's corresponding information to avoid disconnection
    • Remote upgrade of deepflow-agent on cloud servers can be fully completed through deepflow-ctl, without manually mounting hostPath for deepflow-server
    • Support configuring the time interval for Agent list k8s resources to reduce pressure on apiserver
    • When only network monitoring authorization is available, add support for viewing HTTP and DNS under the application menu
    • Automatically delete Agents that have been disconnected for more than vtap_auto_delete_interval, with a default deletion time of more than 1 hour of disconnection
  • Debug
    • Lower the collection interval of all DeepFlow self-monitoring metrics to 10s
    • Embed eBPF data source information (syscall, go-tls, go-http2, openssl, io-event) in the tap_port field to enhance debugging capabilities
    • deepflow-server supports monitoring itself through continuous profiling
  • A series of performance optimizations for various software components