Skip to content

How to fix Linux kernel: neighbour table overflow

homepage-banner

ARP cache is overflowing. Most likely reason - too much traffic on the network.

What is a neighbour table overflow?

The neighbour table is a data structure in the Linux kernel that keeps track of the network devices connected to a specific network interface. It is used to maintain information about the IP addresses and MAC addresses of the devices on the network. When a device wants to communicate with another device on the same network, it consults the neighbour table to find the MAC address of the device.

However, if the number of devices on the network exceeds the size of the neighbour table, the table can overflow. When this happens, the kernel is unable to keep track of all the devices on the network, which can result in dropped packets, network congestion, and even network failure.

kernel error

dmesg |grep "neighbor table overflow"

check gc_thresh

sysctl -a | grep net.ipv4.neigh.default.gc_thresh

default value

/proc/sys/net/ipv4/neigh/default/gc_stale_time:60
/proc/sys/net/ipv4/neigh/default/gc_thresh1:128
/proc/sys/net/ipv4/neigh/default/gc_thresh2:512
/proc/sys/net/ipv4/neigh/default/gc_thresh3:1024
  • gc_stale_time determines the period of validity check for adjacent layer records. When adjacent layer records expire, they will be parsed again before sending data to them. The default value is 60 seconds.
  • gc_thresh1 is the minimum number of layers in the ARP cache. If it is less than this number, the garbage collector will not run. The default value is 128.
  • gc_thresh2 is the maximum number of records that can be stored in the ARP cache. The garbage collector allows the number of records to exceed this number for 5 seconds before starting to collect. The default value is 512.
  • gc_thresh3 is the hard limit of the maximum number of records that can be stored in the ARP cache. Once the number of records in the cache exceeds this number, the garbage collector will run immediately. The default value is 1024.
gc_stale_time (since Linux 2.2)
       Determines how often to check for stale neighbor entries.  When a neighbor entry is considered stale, it is resolved again before sending data to  it.
       Defaults to 60 seconds.

gc_thresh1 (since Linux 2.2)
       The  minimum  number  of  entries  to keep in the ARP cache.  The garbage collector will not run if there are fewer than this number of entries in the
       cache.  Defaults to 128.

gc_thresh2 (since Linux 2.2)
       The soft maximum number of entries to keep in the ARP cache.  The garbage collector will allow the number of entries to exceed this for 5 seconds  be‐
       fore collection will be performed.  Defaults to 512.

gc_thresh3 (since Linux 2.2)
       The  hard  maximum number of entries to keep in the ARP cache.  The garbage collector will always run if there are more than this number of entries in
       the cache.  Defaults to 1024.

analysis

arp -v

## sum the arp record number
arp -an | wc -l

best practice

/etc/sysctl.conf

## works best with <= 500 client computers ##
# Force gc to clean-up quickly
net.ipv4.neigh.default.gc_interval = 3600

# Set ARP cache entry timeout
net.ipv4.neigh.default.gc_stale_time = 3600

# Setup DNS threshold for arp 
net.ipv4.neigh.default.gc_thresh3 = 4096
net.ipv4.neigh.default.gc_thresh2 = 2048
net.ipv4.neigh.default.gc_thresh1 = 1024
sysctl -p

reference

  • man 7 arp
  • https://openai.com/blog/scaling-kubernetes-to-2500-nodes
  • https://www.cyberciti.biz/faq/centos-redhat-debian-linux-neighbor-table-overflow
Leave your message