Ceph 12.2.2: Minor update, major trouble

Recently, Ceph “Luminous” V12.2.2 was released, a bug fix release for the latest stable release of Ceph. It contains some urgently awaited fixed, i.e. for “Bluestore” memory leaks, and admins around the world started upgrading immediately.

Just before Chrismas, I had to handle a “situation” with such an upgraded Ceph cluster. It had been working for months, coming from pre-Luminous times, was upgraded to V12.2.1 a few weeks ago and now was brought to V12.2.2 in preparation of introducing “Bluestore” OSDs. Admittedly, the cluster wasn’t in perfect shape, but “HEALTH_OK” was reported before and right after the upgrade to V12.2.2.

Things started to go wrong when the first OSDs were taken "out" in preparation of "Bluestore" OSDs, step 2 of the official docs. The cluster reported "too many PGs per OSD" and showed slow requests that didn't seem to go away. What's worse, the cluster started to show signs of blocked requests, like unresponsive clients and hanging CephFS access. After some time, these were confirmed by "ceph -s", where slow requests turned to blocked requests after 4064 seconds, taking the cluster to HEALTH_ERR. Additionally, the PG rearrangement, started by taking out the first OSDs, came to a halt and left the cluster with still high numbers of misplaced and degraded PGs. Overall, the cluster became unusable.

Ceph caching for image pools

Running a Ceph storage for small and mid-size private clouds can easily become challenging and spotting supporting information is not always that easy.

A major concern will likely be the over-all speed of the Ceph cluster, as seen by the clients. On the same level, the money required to build and operate the cluster will be important, too. So how do you optimize between these two? Will you need SSDs, will you really need 10G networking?

Here's my report of what started as a demo environment and moved on to what you may call a production system.

