v6.3 EE Release Notes

Created:2024-02-27 Last Modified:2024-07-05

This document was translated by ChatGPT

#1. Universal Map, Applications, Network, Infrastructure, Events

  • AutoMetrics
    • Support parsing FastCGI protocol
    • Support parsing Hang Seng T3 protocol
    • Dubbo protocol supports parsing event and serialization_id fields
    • Recognize RST disconnections in SLB health checks as normal behavior
    • Added endpoint field in the application aggregation metrics table
    • Support collecting statement-id of MySQL calls to correlate COM_STMT_EXECUTE and COM_STMT_QUERY, thereby tracing SQL statements
  • AutoTracing
    • Support parsing Tingyun's Tracing field X-Tingyun
    • Support tracing calls before and after managed ALB/SLB services
    • Support parsing TraceID in MySQL statements
    • Added allow_multiple_trace_ids_in_tracing_result configuration item to allow multiple TraceIDs in tracing results
    • Support calling APM's Trace API to supplement tracing data
  • AutoTagging
    • Support automatic association of K8s Annotation and Env tags
    • Added the ability to automatically associate K8s containers through PID, solving the issue of marking container resource information for HostNetwork Pods in eBPF data
  • SQL
    • Added count operator to metrics to calculate the number of rows in raw data
  • Universal Map
    • Optimized service list
      • Only display performance metrics when the service is called
      • Support switching signal sources through signal_source
      • Adjust the default displayed metrics and show bar charts representing the relative size of metrics
    • Added service topology
      • Support defining integrated business access topology for cloud and on-premises
  • Applications
    • Distributed Tracing
      • Adjust the vertical split ratio of the page when sliding right without scaling the flame graph
      • Support tracing starting from network Span
      • Flame graph supports displaying network Span's collection network card information
      • Left quick filter box supports quick filtering and switching of signal sources
    • Continuous Profiling
      • Support linked display of tables and flame graphs
      • Optimized the merging logic of Function Stack in flame graphs
      • Compressed storage of Function Stack in ClickHouse
      • Support eBPF collection of On-CPU Profile data for compiled (Golang/Rust, etc.) and interpreted (Java, etc.) languages
  • Infrastructure
    • Added host and container pages, displayed based on Prometheus metrics
  • Events
    • Split the events page into three pages: resource changes, file read/write, and alert practices
  • GUI
    • Enhanced snapshot search capabilities: support sorting, condition copying, etc.
    • Search bar supports pasting Key: value search conditions
    • Search bar supports modifying the operator of existing conditions
    • Simplified page search conditions and conditions carried by the right slide page to improve usability
    • Comprehensive optimization of page UI
    • Optimized the left quick filter box: support search filtering, display matching data volume, support filtering metric value ranges, support switching query areas, support switching data tables
    • Support opening multiple right slide pages continuously and support jumping between different right slide pages
    • Event data displayed in the right slide page can be switched to view client-side or server-side events separately
    • Popup page for viewing database fields supports displaying table names
    • Support viewing search conditions of the current Panel

#2. Dashboards, Metrics, Alerts, Reports

  • Dashboards
    • Support dragging to modify table size
    • Optimized the display details of bar charts
    • Added a new Panel type: overview map
    • Refactored the Panel editing page
    • Optimized the layout of Panel buttons
    • Merged line charts and Top line charts
    • Optimized the display when template variable names are too long
    • Template variable list supports drag-and-drop sorting
  • Metrics
    • Support inputting PromQL to query data
  • Alerts
    • Support creating alert policies directly (no need to create a Dashboard)
    • HTTP push endpoints support using Tag to render push content
    • Email push titles support using variables
    • Optimized the display of system alert events
  • GUI
    • Unified search condition input boxes for Panel editing page, metric search page, and alert policy editing page

#3. Resources, System

  • Resources
    • Optimized the display of POD list, VPC list, availability zone list, and region list, added container node count and collector status columns
    • Support synchronizing CloneSet and Advanced StatefulSet workloads in OpenKruise
    • Support independent configuration of synchronization intervals for different cloud platforms
    • Support synchronizing IP addresses on loopback interfaces (usually VIPs)
  • Integration
    • Prometheus Integration
      • PromQL supports topk and bottomk functions
      • PromQL API supports RFC3339 time format
      • Support obtaining HTTP Headers in RemoteWrite as additional Labels
      • Optimized storage performance of RemoteWrite, and query performance of RemoteRead and PromQL
    • OpenTelemetry Integration
      • Support running independently of ClickHouse
  • Agent
    • Plugin
      • Added support for so plugins, providing a C SDK
      • Wasm Demo: Parse error codes in HTTP Payload and reassign response_code and response_exception
      • Wasm Demo: Parse Protobuf messages in Payload
    • Changed the periodic reporting interval of long streams from absolute 0 seconds (start of each minute) to relative 0 seconds (whole 60 minutes relative to stream start time)
      • Advantage: Reduced the pressure of sending stream logs at absolute 0 seconds, avoiding splitting streams with a lifecycle of less than 60 seconds into two stream logs
    • Configuration
      • Support configuring CPU affinity and priority
      • Added kprobe-blacklist configuration item to set port number blacklist for eBPF data collection, avoiding collection loops
      • Added stream log ignore statistics position (l4_log_ignore_tap_sides) and call log ignore statistics position (l7_log_ignore_tap_sides) to reduce data collection volume
    • Adaptation
      • Support running in Tencent TCE's DPDK host machine
      • Removed the requirement for HostNetwork in container collectors
      • Support environments where the number of matching results for network cards (tap_interface_regex) exceeds 255
      • Support running in business Pods in Sidecar mode
      • Support deployment as a BlueKing Plugin
  • Server
    • Support configuring system alert mailboxes on the page
    • Default value for automatic deletion time of abnormal controllers and data nodes is 30 days
    • Unified storage of alert events in ClickHouse
    • Real-time push to Agent when resource information changes are detected
    • Support disabling K8s cluster auto-discovery, allowing synchronization as an affiliated K8s cluster of public cloud
    • Support specifying (fixed) Agent for K8s resource information synchronization
    • Added deepflow identifier to all HostPath paths required for deployment
  • CLI
    • Released deepflow-ctl for MacOS