|4.4.y kernel and XFS. Sometimes being Fast and First is hard
||[Mar. 4th, 2016|02:51 pm]
TLDR: Update xfsprogs to xfsprogs-4.3.0-1 before you update the kernel to 4.4.4 on Fedora 22/23.
Every once in a while we get a bug report that sounds really really bad. So bad that it makes us do a double take and start thinking about it in detail even before we complete the normal daily bug review. Today was one of those days.
This morning I read a bug report that suggested the 4.4 kernel will not boot with an xfsprogs in userspace less than version 4.3.0. That sounds pretty bad. Fedora 23 has already been rebased to the 4.4.y stable series of kernels, and it is in update-testing for Fedora 22. Given that the default filesystem in Server edition is XFS, that definitely caused some concern.
However, thinking about it even for a moment it seemed somewhat odd. First, typically the only userspace component involved in mounting an XFS filesystem is mount(8). That isn't provided by the xfsprogs package. Even if that is what was broken, one would think that there would have been bug reports long before now given that Fedora 23 was rebased a while ago already. Perhaps fsck might come into play too, and that's a possibility but it isn't very likely. So it was time to dig in further. The report linked to an upstream discussion. Here is where things started to get clearer. I'll try and explain a bit.
Around the 3.15 kernel release, XFS added support for maintaining CRC checksums for on-disk metadata objects. According to the mkfs.xfs manpage:
CRCs enable enhanced error detection due to hardware issues, whilst the format changes also improves crash recovery algorithms and the ability of various tools to validate and repair metadata corruptions when they are found. The CRC algorithm used is CRC32c, so the overhead is dependent on CPU archi‐ tecture as some CPUs have hardware acceleration of this algorithm. Typically the overhead of calculat‐ ing and checking the CRCs is not noticable in normal operation.
Yay! More robustness and better error recovery. Fantastic. However, features like this are rarely enabled by default when they are first released. In fact, the addition of this necessitated bumping the version number of the XFS on-disk format to version 5. For further caution, xfsprogs didn't start supporting this until xfsprogs-3.2.0 and didn't make it the default for mkfs.xfs until xfsprogs-3.2.3. Pretty cautious on the part of the XFS developers. But that still leaves us trying to figure out why the report was submitted and what it means for Fedora.
First, the actual problem. In the 4.4 kernel release the XFS developers started validating some additional metadata of V5 filesystems against the log. This is fine for healthy filesystems. However, if you had a crash and ran xfs_repair from an older xfsprogs package, it would unconditionally zero out the log and the kernel would be very confused and report corruption when you tried to mount it. The xfsprogs-4.3 release fixed this in the xfs_repair utility. OK, so there's validity to the 4.4 kernel needing xfsprogs-4.3, but only in certain circumstances. Namely, you have to have a V5 filesystem on disk, and you have to run xfs_repair from an older xfsprogs against it.
Fortunately, that isn't a super common situation. For Fedora 22, we released with xfsprogs-3.2.2. That means any new XFS filesystems created from installation media should still be V4 and not hit this issue (prior EOL Fedora releases are in the same boat). It's possible someone manually specified crc=1 when they created their F22 XFS partition, but that is a rare case. Fedora 23, on the other hand, shipped with xfsprogs-3.2.4 and should be creating V5 XFS filesystems by default. Those users would still need to run xfs_repair, which isn't a massively common thing to do. Unfortunately, it is common enough we need to do something about it.
The first thing to do was get xfsprogs updated in Fedora 22 and 23 (24 and rawhide are fine already). Eric Sandeen, being the massively awesome human being he is, had that done before I had even fully understood the problem. He was, in fact, so fast that he had the bodhi updates file before we even talked about what to do in the kernel package. That isn't ideal, but it's workable.
The second thing to do was to make sure the kernel package took this into account. Originally we discussed adding a Requires on the appropriate xfsprogs version. That actually isn't a great idea though. The kernel package doesn't actually Require any of the filesystem creation tools packages and it shouldn't. Users don't want to be forced to drag around xfsprogs if they aren't even using XFS filesystems, or btrfs-progs if they aren't using btrfs, etc. So we quickly realized that we needed to use Conflicts.
For the 4.4.4 kernel updates in Fedora 22 and Fedora 23, we added the Conflicts for xfsprogs < 4.3.0-1. You may see output that looks similar to this when you dnf update:
Error: package kernel-core-4.4.4-301.fc23.x86_64 conflicts with xfsprogs < 4.3.0-1 provided by xfsprogs-3.2.4-1.fc23.x86_64.
The solution here is to make sure you update xfsprogs first. Then your kernel update should work fine.
We'd like to thank the original bug reporter and Eric Sandeen for their help.
[Edit: Added TLDR for people that don't like to read and/or hate my writing]