v6.4 EE Release Notes

Created:2024-02-27 Last Modified:2024-07-05

This document was translated by ChatGPT

#1. Universal Map, Applications, Network, Infrastructure, Events

  • AutoMetrics
    • Support parsing MongoDB protocol (Documentation)
    • Support parsing TLS protocol and obtaining information such as access domain, key algorithm, TLS version, certificate expiration time, etc.
    • Support parsing all encrypted application protocols over TLS, not limited to HTTP
    • Enhanced parsing capability for HTTP2 compressed headers, cBPF, eBPF kprobe data support compression restoration
    • MySQL, PostgreSQL, Redis protocol data support obfuscation, can be enabled through Agent's obfuscate-enabled-protocols
    • Support parsing Oracle protocol
    • Optimized default parsing port for DNS traffic, added 5353, see Agent's l7-protocol-ports configuration item for details
    • In flow logs and network performance metrics, system latency (srt, srt_max) supports recording ICMP traffic latency and correcting ICMP traffic direction using ICMP Echo messages
    • Streamlined connection metrics in flow logs, removed redundant rtt_client_avg and rtt_server_avg
    • Support parsing Geneve tunnel encapsulation (adapted to Kube-OVN)
  • AutoTracing
    • Support extracting trace_id from SofaRPC Payload (Hessian encoding, TreeMap structure)
    • Support extracting TraceID fields injected by SkyWalking and OpenTelemetry in Kafka messages
    • Support parsing SkyWalking sw3 Header
  • AutoProfiling
    • Support displaying a universal CPU flame graph for all processes on a server, down to the thread level
    • Generate Java process symbol tables in an interleaved manner with random delays to avoid high load due to clustering, support modifying Agent's java-symbol-file-refresh-defer-interval configuration item to adjust the base interval of the delay
    • No need to generate symbol table files in the Pod where the Java process is located when enabling Profiling
  • AutoTagging
    • Support synchronizing tag information of cloud servers in Tencent Cloud and Huawei Cloud, and resource set information of cloud servers in Alibaba Cloud
    • Support synchronizing NIC MAC addresses of bare metal servers in QingCloud private cloud through Agent
    • Support synchronizing RDS and Redis resources in Huawei Cloud
    • Support synchronizing Redis resources in Baidu Cloud
    • Support injecting custom auto-grouping tags into all observability data, achieving the ability to automatically take the first non-empty tag and ignore subsequent tag columns by combining multiple tag fields
    • Add pod_group_type (K8s workload type) tag field to all observability data, extend the value of auto_service_type to represent the type of K8s workload
    • Add coroutine ID fields syscall_coroutine_0, syscall_coroutine_1 to call logs
    • Support extracting topic_name field from Kafka messages and assigning it to the request_resource in call logs
    • Support extracting endpoint from HTTP URL and assigning it to call logs and application performance metrics data, support configuring http-endpoint-extraction extraction rules for Agent
    • Support accurately setting Pod tags for all eBPF observability data of HostNetwork Pods
    • Add gprocess process information tags to continuous profiling data
    • Deprecated l7_protocol (application protocol) type Others, merged HTTP_TLS, HTTP2_TLS into HTTP and HTTP2, added is_tls to call logs to indicate whether it is encrypted traffic
  • Universal Map
    • Service list supports clicking to pop up a right-slide page to display associated data of the service
    • Service topology supports manual adjustment of node positions
    • Support setting the metrics displayed in the service list and service topology
    • Right-slide page automatically associates infrastructure metrics data
    • Support sharing business with other accounts
  • Network
    • Support inserting double-layer VLAN (802.1Q / QinQ) in distributed traffic to express 24bit traffic tags
    • Support specifying 12bit traffic tags for the inner and outer layers of QinQ in the distribution strategy
    • Support associating and displaying all call logs from the flow log details page
    • Support distributing unidirectional traffic only, such as distributing SQL request traffic only
  • Infrastructure
    • Default deployment of grafana-agent to collect infrastructure metrics of the K8s cluster where DeepFlow Server is located
  • Usability Improvements
    • Search Box
      • Support custom search bar, providing two simple search modes by default: container search and process search
      • Added bidirectional path search mode, ignoring the direction of client and server
      • Removed "path filter" condition echo in the search bar, indicated by ICON status
      • Optimized performance of the dropdown input box, supporting prompts and filtering for tens of thousands of options
      • Support setting the trigger behavior of the search box when the page is first loaded and during the input process (automatic/manual)
      • Aligned search conditions on the continuous profiling page with other pages
      • Support setting dynamic value ranges for template variables, such as supporting setting cloud server options through VPC
    • Universal Map
      • Support batch definition of paths in business definitions
      • Support directly clicking to edit paths on the service topology
      • Topology graph in the right-slide page of the service topology supports marking nodes and connections in red based on thresholds
      • Usability improvements on the service topology editing page, standard operation capabilities aligned for Panel, topology display optimization
      • Optimized node positions when the service topology page is first loaded, centered and displayed in full view
      • Optimized business definition details page to display more information about the business
      • Optimized service topology operation experience
    • TCP Sequence Diagram
      • Optimized display style of special packets (SYN, FIN, RST)
      • Added date column to display human-readable time information
    • Right-slide Page
      • In the right-slide page, flow logs, call logs, and event tabs support quick filter box on the left
      • Added time control in the network path page
      • Overview graph supports modifying metrics
    • Distributed Tracing
      • Optimized the arrangement order of network spans in the distributed tracing flame graph
      • Quick filter box on the left supports filtering by application protocol
      • Fixed the Tab page below the flame graph to avoid page jitter
    • Trend Analysis: Optimized the display of trend analysis graphs in flow logs, PCAP download, call logs, distributed tracing, events, and continuous profiling pages
    • Knowledge Graph: Optimized the display of deleted resource names in the knowledge graph: $name (deleted)
    • Screen Adaptation: Adjusted page chart sizes to optimize display effects on small screens
    • Optimized default columns displayed in the flow log table, added flow end type
    • Optimized column selection operations in tables to reduce mouse clicks
    • Resource names are copied without carrying resource type names
    • Optimized display of the quick filter box on the left
    • Optimized page menu and topology graph styles

#2. Dashboard, Metrics, Alerts, Reports

  • Dashboard
    • Panel supports multiple query conditions
    • Support creating Panel directly in the Dashboard
    • Predefined SQL, Redis, DNS, Ingress, Dubbo Dashboards
    • Panel supports setting colors
    • Overview graph supports setting colors and font sizes, supports adding hourly and daily synchronization display of metric values, supports displaying historical trends of metric values in the background
    • Bar chart supports adjusting sorting order (ascending, descending)
    • Table columns are allocated based on content by default
    • Support switching the query area of all Panels in the entire Dashboard
  • Usability Improvements
    • Optimized layout of the Dashboard list page, added quick filter box on the left
    • Optimized default layout position when adding Panel
    • Fixed header row when scrolling down the Dashboard page
    • Optimized search condition editing area of Panel

#3. Resources, System

  • Integration
    • Offloaded PromQL operators to ClickHouse, improving PromQL query performance
    • Server supports exporting metrics through Prometheus RemoteWrite protocol (thanks to chenjiandongx: PR (opens new window))
    • Support for extensible Exporter interface
  • Agent
    • eBPF functionality of Agent adapted to Linux 3.10 kernel (Detailed Documentation)
    • Support collecting traffic in DPDK environment
    • Use TCP protocol to transmit Agent's own logs
    • Reduced Agent memory consumption by 60% in scenarios with a large number of new TCP flows
    • Optimized HTTP2 parsing performance, reducing CPU usage by 60%
  • Wasm
    • Wasm Plugin supports dynamic loading in Agent
  • Server
    • Optimized K8s Label, Annotation, Env synchronization mechanism, support setting regular expression filters for interested tags, support limiting the maximum length of tag values
    • Support balancing data sending Server based on the amount of data sent by Agent, improving data balance in ClickHouse
    • ClickHouse uses Array(LowCardinality(String)) instead of Array(String) to optimize read and write performance of low cardinality fields, such as tag_names, metrics_names, etc.
    • DeepFlow's self-monitoring metrics are merged into one table deepflow_system.deepflow_system in ClickHouse
    • Simplified host list CIDR configuration on the Server side when synchronizing resource information through Agent
    • Modified Avg operator logic, Avg represents weighted average algorithm, AAvg represents arithmetic average algorithm
    • Added trace_id_index integer column in ClickHouse as an index column for trace_id field, and supports extracting Timestamp from it, accelerating TraceID search
    • Profiling data supports plaintext (non-compressed) storage in ClickHouse
    • ClickHouse upgraded to v23.8 (LTS)
    • Deprecated Telegraf component
  • API
    • Added Derivative pre-operator to SQL API, allowing calculation of differences for Prometheus Counter type metrics to calculate rates
    • Added TopK and Any operators to SQL API to get high-frequency or arbitrary values of specified tags
    • Support deleting cloud platforms by name
    • Optimized API performance of resource pages
  • CLI
    • Support debugging eBPF Socket Data using deepflow-ctl
    • Added list command: deepflow-ctl domain additional-resource list --type <resource_type> --name <resource_name>
    • Provided deepflow-ctl for MacOS
  • Usability Improvements
    • Dropdown boxes on system resource pages support search filtering