Tune Linux Kernel for K8s

Introduction
The performance and reliability of Kubernetes clusters are greatly impacted by the underlying system, specifically the Linux kernel settings. This guide explores kernel tuning and optimization methods to improve the performance and stability of Kubernetes clusters.
Parameter tuning list
- The
net.core.somaxconnparameter determines the maximum number of connections that a listen socket can queue for acceptance. This value might be set too low by default for high-traffic Kubernetes clusters, which could result in connection drops or delays during peak times. - The
net.ipv4.tcp_max_syn_backlogparameter determines the maximum number of pending connection requests that have not yet received an acknowledgment from the connecting client. Adjustments to this parameter can be advantageous for servers that handle a high volume of incoming connections and operate under heavy load conditions. - The
net.ipv4.ip_local_port_rangesetting specifies the range of port numbers available for outbound connections. The default range might be inadequate for services that establish numerous short-lived connections. Thus, increasing this range can prevent port exhaustion and related connectivity problems. - The parameters
vm.dirty_ratioandvm.dirty_background_ratiogovern when the kernel chooses to write modified (“dirty”) memory pages back to the disk.vm.dirty_background_ratiorepresents the percentage of system memory that can contain dirty pages before the kernel begins to write them out asynchronously. On the other hand,vm.dirty_ratiodenotes the upper limit of memory that can be filled with dirty pages, beyond which processes must write out dirty pages during their execution. - The
fs.file-maxparameter defines the maximum number of file handles that the Linux kernel can allocate. If you’re running a multitude of containers or applications that open numerous files at once, raising this limit can help avoid a potential shortage of file descriptors. - The
fs.inotify.max_user_watchesparameter regulates the maximum number of files that can be monitored for modifications using inotify. This is crucial for applications requiring real-time responses to changes, like live-reload development tools and file synchronization services. - Linux I/O schedulers play a significant role in disk operation performance, particularly under various workloads. The scheduler chosen can influence throughput, latency, and Input/Output Operations Per Second (IOPS). Typical schedulers are
deadline,cfq(Completely Fair Queuing), andnoop. Newer kernels providemq-deadline,kyber, andbfqfor multi-queue block devices. For systems with SSDs or high-performance storage, thenoopormq-deadlinescheduler may offer better performance due to their simplicity and lower overhead. - The
kernel.sched_migration_cost_nsparameter specifies the duration, in nanoseconds, that a process should run before it is deemed beneficial to migrate it to another CPU. This is crucial in Kubernetes environments where pods might frequently change CPUs. Reducing this value may cause the scheduler to be more aggressive in moving processes. This could potentially enhance load balancing across CPUs but may come at the expense of increased cache-miss rates. - The
kernel.sched_autogroup_enabledparameter is part of a feature designed to improve system responsiveness under heavy load. It accomplishes this by automatically grouping tasks with similar execution patterns. Although this is beneficial for desktop responsiveness, it may not always be ideal in server environments, particularly those running Kubernetes. This is because it could result in an uneven distribution of CPU resources among pods.
so, add the following line to /etc/sysctl.conf
net.core.somaxconn = 1024
net.ipv4.tcp_max_syn_backlog = 2048
net.ipv4.ip_local_port_range = 10240 65535
vm.overcommit_memory = 1
vm.dirty_background_ratio = 5
vm.dirty_ratio = 15
fs.file-max = 500000
fs.inotify.max_user_watches = 524288
kernel.sched_migration_cost_ns = 500000
kernel.sched_autogroup_enabled = 0
Reference
https://overcast.blog/kernel-tuning-and-optimization-for-kubernetes-a-guide-a3bdc8f7d255https://www.brendangregg.com/linuxperf.htmlhttps://www.brendangregg.com/Perf/linux_perf_tools_full.pnghttps://platform9.com/blog/10-kubernetes-performance-tips/https://www.splunk.com/en_us/appdynamics-joins-splunk.html
Some of the content is generated by AI, please be cautious in identifying it.