Skip to content

Ceph Performance Tuning Suggestions

Introduction

Ceph is an open-source, distributed storage system that provides high-performance, fault-tolerant storage for both block and object storage. However, to achieve optimal performance, it is essential to tune the Ceph cluster settings.

I. System Configuration Tuning

1. Set disk read-ahead cache

2. Set the number of system processes

3. Adjust CPU performance

Note: Virtual machines and some hardware CPUs may not support adjustment.

  1. Ensure that the kernel tuning tool is installed.
  2. Adjust to performance mode.

Adjustment can be made for each core, or use CPU tools to make adjustments. Supports five operating mode adjustments:

performance: focuses only on efficiency, working at the highest supported operating frequency, this mode is the maximum pursuit of high-performance systems. powersave: sets the CPU frequency to the lowest so-called “power-saving” mode, and the CPU will work at its lowest supported operating frequency. This mode is the maximum pursuit of low-power systems. userspace: the system hands over the decision-making power of frequency conversion to user-space applications and provides corresponding interfaces for user-space applications to adjust CPU operating frequency usage. ondemand: quickly and dynamically adjust the CPU frequency on demand. As soon as there is a CPU computational task, it will immediately run at the maximum frequency, and it will immediately return to the lowest frequency after completion. conservative: It smoothly adjusts the CPU frequency, and the rise and fall of the frequency are gradual. It will automatically adjust the frequency between the upper and lower limits. The main difference from the ondemand mode is that it will gradually allocate the frequency on demand, rather than blindly pursuing the highest frequency.

4. Optimize network parameters

Modify the configuration file:

Configuration content:

net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 16384 16777216
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

II. Ceph Cluster Optimization Configuration

1. Ceph’s main configuration parameters

  • FILESTORE configuration parameters
  • Journal configuration parameters
  • osd config tuning configuration parameters
  • osd-recovery tuning configuration parameters
  • osd-client tuning configuration parameters

2. Optimization configuration examples

III. Tuning Best Practices

1. MON Suggestions

The deployment of the Ceph cluster must be properly planned, and the performance of MON is critical to the overall performance of the cluster. MON should usually be on a dedicated node. To ensure proper arbitration, the number of MONs should be odd.

2. OSD Suggestions

Each Ceph OSD has a log. The OSD’s logs and data may be placed on the same storage device. When a write operation is submitted to the log of all OSDs in the PG, the write operation is marked as completed. Therefore, faster log performance can improve response time.

In a typical deployment, OSD uses traditional mechanical hard drives with higher latency. To maximize efficiency, Ceph recommends using separate low-latency SSD or NVMe devices for OSD logs. Administrators must be cautious not to put too many OSD logs on the same device, as this may become a performance bottleneck. The impact of the following SSD specifications should be considered:

  • Average time between failures (MTBF) with supported write times
  • IOPS capability (Input/Output Operations Per Second), read and write times per second
  • Data transfer rate
  • Bus/SSD coupling capability

Red Hat recommends that no more than 6 OSD logs be used per SATA SSD device or no more than 12 OSD logs per NVMe device.

3. RBD Suggestions

The workload on RBD block devices is usually I/O-intensive, such as databases running on virtual machines in OpenStack. For RBD, the OSD log should be on an SSD or NVMe device. For backend storage, different service levels can be provided based on the storage technology used to support OSD (i.e., NVMe SSDs, SATA SSDs, or HDDs).

4. Object Gateway Suggestions

The workload on the Ceph object gateway is typically throughput-intensive. If it is audio and video data, it may be very large. However, the bucket index pool may show more I/O-intensive workload patterns. Administrators should store this pool on an SSD device.

Ceph object gateways maintain an index for each bucket, and Ceph stores this index in a RADOS object. As the bucket grows, the index performance will decrease (because only one RADOS object is involved in all index operations).

Therefore, Ceph can save large indexes in multiple RADOS objects or shards. Administrators can enable this feature by setting the rgw_override_bucket_index_max_shards configuration parameter in the ceph.conf configuration file. The recommended value for this parameter is the expected number of objects in the bucket divided by 100,000.

5. CephFs Suggestions

The metadata pool that stores directory structure and other indexes may become a bottleneck for CephFS. SSD devices can be used for this pool. Each CephFS metadata server (MDS) maintains an in-memory cache for indexing nodes and other types of items. Ceph uses the mds_cache_memory_limit configuration parameter to limit the size of this cache. Its default value is expressed in absolute bytes, equal to 1 GB, and can be tuned when needed.

Feedback