Skip to content

Linux Kernel Tuning for Kubernetes

Introduction

The performance and reliability of Kubernetes clusters are greatly impacted by the underlying system, specifically the Linux kernel settings. This guide explores kernel tuning and optimization methods to improve the performance and stability of Kubernetes clusters.

Parameter tuning list

  • The net.core.somaxconn parameter determines the maximum number of connections that a listen socket can queue for acceptance. This value might be set too low by default for high-traffic Kubernetes clusters, which could result in connection drops or delays during peak times.
  • The net.ipv4.tcp_max_syn_backlog parameter determines the maximum number of pending connection requests that have not yet received an acknowledgment from the connecting client. Adjustments to this parameter can be advantageous for servers that handle a high volume of incoming connections and operate under heavy load conditions.
  • The net.ipv4.ip_local_port_range setting specifies the range of port numbers available for outbound connections. The default range might be inadequate for services that establish numerous short-lived connections. Thus, increasing this range can prevent port exhaustion and related connectivity problems.
  • The parameters vm.dirty_ratio and vm.dirty_background_ratio govern when the kernel chooses to write modified (“dirty”) memory pages back to the disk. vm.dirty_background_ratio represents the percentage of system memory that can contain dirty pages before the kernel begins to write them out asynchronously. On the other hand, vm.dirty_ratio denotes the upper limit of memory that can be filled with dirty pages, beyond which processes must write out dirty pages during their execution.
  • The fs.file-max parameter defines the maximum number of file handles that the Linux kernel can allocate. If you’re running a multitude of containers or applications that open numerous files at once, raising this limit can help avoid a potential shortage of file descriptors.
  • The fs.inotify.max_user_watches parameter regulates the maximum number of files that can be monitored for modifications using inotify. This is crucial for applications requiring real-time responses to changes, like live-reload development tools and file synchronization services.
  • Linux I/O schedulers play a significant role in disk operation performance, particularly under various workloads. The scheduler chosen can influence throughput, latency, and Input/Output Operations Per Second (IOPS). Typical schedulers are deadline, cfq (Completely Fair Queuing), and noop. Newer kernels provide mq-deadline, kyber, and bfq for multi-queue block devices. For systems with SSDs or high-performance storage, the noop or mq-deadline scheduler may offer better performance due to their simplicity and lower overhead.
  • The kernel.sched_migration_cost_ns parameter specifies the duration, in nanoseconds, that a process should run before it is deemed beneficial to migrate it to another CPU. This is crucial in Kubernetes environments where pods might frequently change CPUs. Reducing this value may cause the scheduler to be more aggressive in moving processes. This could potentially enhance load balancing across CPUs but may come at the expense of increased cache-miss rates.
  • The kernel.sched_autogroup_enabled parameter is part of a feature designed to improve system responsiveness under heavy load. It accomplishes this by automatically grouping tasks with similar execution patterns. Although this is beneficial for desktop responsiveness, it may not always be ideal in server environments, particularly those running Kubernetes. This is because it could result in an uneven distribution of CPU resources among pods.

so, add the following line to /etc/sysctl.conf

net.core.somaxconn = 1024
net.ipv4.tcp_max_syn_backlog = 2048
net.ipv4.ip_local_port_range = 10240 65535
vm.overcommit_memory = 1
vm.dirty_background_ratio = 5
vm.dirty_ratio = 15
fs.file-max = 500000
fs.inotify.max_user_watches = 524288
kernel.sched_migration_cost_ns = 500000
kernel.sched_autogroup_enabled = 0

Reference

  • https://overcast.blog/kernel-tuning-and-optimization-for-kubernetes-a-guide-a3bdc8f7d255
Feedback