Production Deployment Recommendations

Created:2024-04-25 Last Modified:2024-08-26

This document was translated by ChatGPT

#1. Introduction

DeepFlow production deployment recommendations.

#2. Use LTS Version of DeepFlow

Add the --version 6.4.9 parameter to helm to install or upgrade the LTS version of DeepFlow Server and Agent.

#2.1 Install LTS Version of DeepFlow Server

# helm repo add deepflow https://deepflowio.github.io/deepflow

helm repo update deepflow # use `helm repo update` when helm < 3.7.0
helm upgrade --install deepflow -n deepflow deepflow/deepflow --version 6.4.9 --create-namespace
1
2
3
4
# helm repo add deepflow https://deepflow-ce.oss-cn-beijing.aliyuncs.com/chart/stable

helm repo update deepflow # use `helm repo update` when helm < 3.7.0
# cat << EOF > values-custom.yaml
# global:
#   image:
#       repository: registry.cn-beijing.aliyuncs.com/deepflow-ce
# grafana:
#   image:
#     repository: registry.cn-beijing.aliyuncs.com/deepflow-ce/grafana
# EOF
helm upgrade --install deepflow -n deepflow deepflow/deepflow --version 6.4.9 --create-namespace \
  -f values-custom.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13

#2.2 Install LTS Version of DeepFlow Agent

#2.2.1 K8s Environment

# cat << EOF > values-custom.yaml
# deepflowServerNodeIPS:
# - 10.1.2.3  # FIXME: K8s Node IPs
# - 10.4.5.6  # FIXME: K8s Node IPs
# clusterNAME: k8s-1  # FIXME: name of the cluster in deepflow
# EOF

# helm repo add deepflow https://deepflowio.github.io/deepflow

helm repo update deepflow # use `helm repo update` when helm < 3.7.0
helm upgrade --install deepflow-agent -n deepflow deepflow/deepflow-agent --version 6.4.9 --create-namespace \
    -f values-custom.yaml
1
2
3
4
5
6
7
8
9
10
11
12
# cat << EOF > values-custom.yaml
# image:
#   repository: registry.cn-beijing.aliyuncs.com/deepflow-ce/deepflow-agent
# deepflowServerNodeIPS:
# - 10.1.2.3  # FIXME: K8s Node IPs
# - 10.4.5.6  # FIXME: K8s Node IPs
# clusterNAME: k8s-1  # FIXME: name of the cluster in deepflow
# EOF

# helm repo add deepflow https://deepflowio.github.io/deepflow

helm repo update deepflow # use `helm repo update` when helm < 3.7.0
helm upgrade --install deepflow-agent -n deepflow deepflow/deepflow-agent --version 6.4.9 --create-namespace \
  -f values-custom.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14

#2.2.2 Cloud Host Environment

Switch the Agent download link to the LTS version:

curl -O https://deepflow-ce.oss-cn-beijing.aliyuncs.com/rpm/agent/v6.4.9/linux/$(arch | sed 's|x86_64|amd64|' | sed 's|aarch64|arm64|')/deepflow-agent-rpm.zip
unzip deepflow-agent-rpm.zip
yum -y localinstall x86_64/deepflow-agent-1.0*.rpm
1
2
3
curl -O https://deepflow-ce.oss-cn-beijing.aliyuncs.com/deb/agent/v6.4.9/linux/$(arch | sed 's|x86_64|amd64|' | sed 's|aarch64|arm64|')/deepflow-agent-deb.zip
unzip deepflow-agent-deb.zip
dpkg -i x86_64/deepflow-agent-1.0*.systemd.deb
1
2
3
curl -O https://deepflow-ce.oss-cn-beijing.aliyuncs.com/bin/agent/v6.4.9/linux/$(arch | sed 's|x86_64|amd64|' | sed 's|aarch64|arm64|')/deepflow-agent.tar.gz
tar -zxvf deepflow-agent.tar.gz -C /usr/sbin/

cat << EOF > /etc/systemd/system/deepflow-agent.service
[Unit]
Description=deepflow-agent.service
After=syslog.target network-online.target

[Service]
Environment=GOTRACEBACK=single
LimitCORE=1G
ExecStart=/usr/sbin/deepflow-agent
Restart=always
RestartSec=10
LimitNOFILE=1024:4096

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

#2.3 Install LTS Version of Cli

Switch the Cli download link to the LTS version:

curl -o /usr/bin/deepflow-ctl https://deepflow-ce.oss-cn-beijing.aliyuncs.com/bin/ctl/v6.4.9/linux/$(arch | sed 's|x86_64|amd64|' | sed 's|aarch64|arm64|')/deepflow-ctl
chmod a+x /usr/bin/deepflow-ctl
1
2

#3. Use Managed MySQL

In a production environment, it is recommended to use a managed MySQL to ensure availability. It is recommended to use MySQL version 8.0 or above. The following databases need to be created and authorized in advance:

  • deepflow
  • grafana

values-custom.yaml configuration:

global:
  externalMySQL:
    enabled: true ## Enable external MySQL
    ip: 10.1.2.3 ## External Mysql IP address, Need to allow deepflow-server and clickhouse access
    port: 3306 ## External Mysql port
    username: root ## External Mysql username
    password: password ## External Mysql password
mysql:
  enabled: false ## Close MySQL deployment
1
2
3
4
5
6
7
8
9

#4. Use Managed ClickHouse

In a production environment, it is recommended to use a managed ClickHouse to ensure availability. It is recommended that the version of ClickHouse be at least 21.8. The following databases need to be created and authorized in advance:

  • deepflow_system
  • event
  • ext_metrics
  • flow_log
  • flow_metrics
  • flow_tag
  • profile

values-custom.yaml configuration:

global:
  externalClickHouse:
    enabled: true ## Enable external ClickHouse
    type: ep

    ## External ClickHouse clusterName,The default value is 'default', query method:  'select cluster,host_address,port from system.clusters;'
    clusterName: default

    ## External ClickHouse storage policy name,The default value is 'default', query method: 'select policy_name from system.storage_policies;'
    storagePolicy: default
    username: default ## External ClickHouse username
    password: password ## External ClickHouse Password

    ## External ClickHouse IP address and port list, DeepFlow writes IP and port information to an svc endpoint, deepflow-server obtains ClickHouse's IP:Port through get&wath&list endpoint.
    ## deepflow-server needs to access the real IP address of ClickHouse, the port is connected using tcp-port, usually 9000, and query IP:Port through 'select host_address,port from system.clusters;'.
    hosts:
      - ip: 10.1.2.3
        port: 9000
      - ip: 10.1.2.4
        port: 9000
      - ip: 10.1.2.5
        port: 9000
clickhouse:
  enabled: false ## Close ClickHouse deployment
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

DeepFlow will write the IP:Port information of ClickHouse into an Endpoint of a Service. The controller and ingester of deepflow-server obtain the ClickHouse address list through the list&watch of this Service's Endpoint. The controller connects to all ClickHouse instances to create databases and table structures, while the ingester sorts all deepflow-server pod names and Endpoint IPs, mapping them to deepflow-server and ClickHouse in sequence, creating databases, table structures, and writing observability data. The querier accesses this Service to query observability data.

Since ClickHouse needs to request MySQL, it is recommended to use managed MySQL along with managed ClickHouse.

If only using managed ClickHouse without managed MySQL, it is recommended to open the NodePort of MySQL and configure global.externalMySQL to the NodePort access address.

values-custom.yaml configuration:

global:
  externalClickHouse:
    enabled: true ## Enable external ClickHouse
    type: ep

    ## External ClickHouse clusterName,The default value is 'default', query method:  'select cluster,host_address,port from system.clusters;'
    clusterName: default

    ## External ClickHouse storage policy name,The default value is 'default', query method: 'select policy_name from system.storage_policies;'
    storagePolicy: default
    username: default ## External ClickHouse username
    password: password ## External ClickHouse Password

    ## External ClickHouse IP address and port list, DeepFlow writes IP and port information to an svc endpoint, deepflow-server obtains ClickHouse's IP:Port through get&wath&list endpoint.
    ## deepflow-server needs to access the real IP address of ClickHouse, the port is connected using tcp-port, usually 9000, and query IP:Port through 'select host_address,port from system.clusters;'.
    hosts:
      - ip: 10.1.2.3
        port: 9000
      - ip: 10.1.2.4
        port: 9000
      - ip: 10.1.2.5
        port: 9000
  externalMySQL:
    enabled: true
    ip: xx.xx.xx.xx ## External Mysql IP address, Need to allow deepflow-server and clickhouse access
    port: 30123 ## External Mysql port
    username: root ## External Mysql username
    password: deepflow
clickhouse:
  enabled: false ## Close ClickHouse deployment

mysql:
  service:
    type: NodePort
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

If you want to reuse the port allocated by NodePort, you need to deploy twice. Before the second deployment, fill in the port allocated in the first deployment into global.externalMySQL.port.

Since ClickHouse will save the connection method of MySQL, after modifying the MySQL connection, you need to delete all databases in ClickHouse and restart deepflow-server to reset the database.

#5. Optimize Traffic Path from deepflow-agent to deepflow-server

When deepflow-agent starts, it will use the controller-ips in the local configuration file (including ConfigMap) to request deepflow-server. deepflow-server will by default send the Node IP of the deepflow-server Pod to deepflow-agent (the Pod IP of deepflow-server is sent by default in the same cluster) for subsequent request configuration and data sending. When there are multiple deepflow-servers, different deepflow-server Node IPs will be sent for load balancing, and load balancing will be performed periodically.

At this time, two ports' IPs are dynamically sent by deepflow-server to deepflow-agent:

  • deepflow-agent and deepflow-server are not in the same cluster
    • Control plane 30035
    • Data plane 30033
  • deepflow-agent and deepflow-server are in the same cluster
    • Control plane 20035 (configured in deepflow-server ConfigMap as controller.grpc-port, default is 20035)
    • Data plane 20033 (configured in deepflow-server ConfigMap as ingester.listen-port, default is 20033)

By default, deepflow-agent uses NodePort to connect to deepflow-server. This NodePort Service uses externalTrafficPolicy=Cluster, and the traffic from NodePort to deepflow-server will generally be forwarded again, occupying unnecessary inter-node bandwidth. In extreme cases, kube-proxy may occupy too much CPU and other resources due to excessive traffic.

#5.1 Use LoadBalancer Type Service

In environments with LoadBalancer conditions, you can modify the Service type of deepflow-server to LoadBalancer, using LoadBalancer to proxy the traffic of deepflow-agent requests to deepflow-server, improving availability.

values-custom.yaml configuration:

server:
  service:
    type: LoadBalancer
1
2
3

After modifying the Service type of deepflow-server to LoadBalancer, you need to configure agent-group-config to switch the address of deepflow-server requested by deepflow-agent to the LoadBalancer IP:

proxy_controller_ip: 1.2.3.4 # FIXME: Your LoadBalancer IP address
analyzer_ip: 1.2.3.4 # FIXME: Your LoadBalancer IP address
proxy_controller_port: 30035 # The default is 30035
analyzer_port: 30033 # The default is 30033
1
2
3
4

Note: After configuration, this IP will be fixedly sent to the collector as the data transmission IP, and the collector will also fixedly use the controller-ips in the local configuration file to request the control plane port 30035 to obtain configuration information.

#5.2 Use Local externalTrafficPolicy

In environments without LoadBalancer conditions, you can configure the Service of deepflow-server to externalTrafficPolicy=Local to ensure that the traffic accessing the NodePort of a node will only be routed to the deepflow-server on that node. Due to the use of externalTrafficPolicy=Local and deepflow-server drift and other factors, some nodes' NodePorts may not be able to access deepflow-server. You need to be careful to avoid affecting the controller-ip in the configuration file of deepflow-agent.

values-custom.yaml configuration:

server:
  service:
    externalTrafficPolicy: Local
1
2
3

#5.3 Use HostNetwork

Enable the HostNetwork of deepflow-server to reduce the pressure on kube-proxy.

values-custom.yaml configuration:

server:
  hostNetwork: true
  dnsPolicy: ClusterFirstWithHostNet
1
2
3

After enabling the HostNetwork of deepflow-server, you need to configure agent-group-config to switch the port requested by deepflow-agent to deepflow-server:

proxy_controller_port: 20035 # The deepflow-server controller listens on the port. The default port is 20035
analyzer_port: 20033 # The deepflow-server ingester listens on the port. The default port is 20033
1
2

#6. Integrate with Existing Grafana

#6.1 Download and Install Plugins

DeepFlow supports integration with existing Grafana. It is recommended to use version 9.0 or above, with the minimum supported version being 8.0. Currently, DeepFlow's plugins are undergoing certification. Before the certification is completed, you need to configure Grafana to allow loading unsigned plugins:

[plugins]
allow_loading_unsigned_plugins = deepflow-querier-datasource,deepflow-apptracing-panel,deepflow-topo-panel,deepflowio-tracing-panel,deepflowio-deepflow-datasource,deepflowio-topo-panel
1
2

Download the plugin installation package:

curl -O https://deepflow-ce.oss-cn-beijing.aliyuncs.com/pkg/grafana-plugin/stable/deepflow-gui-grafana.tar.gz
1

Extract the downloaded plugin to the Grafana plugin directory, such as /var/lib/grafana/plugins, and restart Grafana to load the plugin:

tar -zxvf deepflow-gui-grafana.tar.gz -C /var/lib/grafana/plugins/
1

#6.2 Add DeepFlow Data Source

You can find DeepFlow Querier in Grafana Data sources and add the following configuration items:

  • Request Url: The NodePort of the deepflow-server service querier port accessed by Grafana. Execute the following command to get the access address:

    echo "http://$(kubectl get nodes -o jsonpath="{.items[0].status.addresses[0].address}"):$(kubectl get --namespace deepflow -o jsonpath="{.spec.ports[0].nodePort}" services deepflow-server)"
    
    1
  • API Token: No need to fill in

  • Tracing Url: The NodePort of the deepflow-app service app port accessed by Grafana. Execute the following command to open the NodePort and get the access address: values-custom.yaml configuration:

    app:
      service:
        type: NodePort
    
    1
    2
    3
    helm upgrade deepflow -n deepflow deepflow/deepflow -f values-custom.yaml
    echo "http://$(kubectl get nodes -o jsonpath="{.items[0].status.addresses[0].address}"):$(kubectl get --namespace deepflow -o jsonpath="{.spec.ports[0].nodePort}" services deepflow-app)"
    
    1
    2

#6.3 Import Dashboard

Click to enter the newly added DeepFlow Data source, switch to the Dashboards page, and click Import on the dashboard to import the dashboard.