Flock Krakow 2016

The annual Fedora contributor's conference, Flock to Fedora, wrapped up last week. From everything I've seen and heard, it was a smashing success. The trip reports and writeups have been very detailed and helpful, so I thought I would take a slightly different perspective with mine. I'll cover a few sessions, but most of this will be from an organizers viewpoint.

However, the highlight for me was meeting three of my new team members in person (or in the case of Dan, seeing him again). My transition to the team had just completed and it was almost purely coincidence that all of us were going to be in the same place at the same time. I appreciate the opportunity to see them, as I really value face to face time with a team. It was a great way to start off on a good note.

Sessions


The Kernel Talk

For the first time in several years, someone else gave the Fedora kernel talk. Laura Abbott did a fantastic job with this. I thought she struck a good balance in using statistics to emphasize her points where needed, whereas my talks tended to be heavy on statistics for statistics sake. The overall theme of inclusion and community during the talk was also excellent. We need the community around the Fedora kernel to grow, and Laura made that point numerous times. Hopefully people watch the video when it is online and feel empowered to reach out to the kernel maintainers. I promise they are very approachable people.
Ensuring ABI stability in Fedora

Dodji Seketeli and Sinny Kumari gave a fascinating talk on abicheck and libabigail, and it's potential uses in Fedora. Having done work on toolchains and ELF in the distant past, I was very impressed on how they are able to present an ABI difference between two package versions down to the individual .h file level and present that in a format that humans are comfortable reading. Hopefully as work continues, we'll be able to utilize this in the Fedora updates process to detect inadvertent ABI changes before they land on users systems. I also think it could be useful in the Modularity effort to validate that the API/ABI a module claims to export is actually what is exported. I'm definitely going to keep an eye on the work Dodji and Sinny are doing here.
Fedora Council Townhall and Update

This session was kind of a bust for it's intended purpose. The people in the room were literally the Council members, a couple of members of OSAS, and one or two people that I think were really just looking for a quiet room to hack in. However, not to make it a waste, we spent most of the session discussing some Flock itself among the Council and OSAS members. I felt this was fairly productive, but there wasn't anything firm enough to recap here. Hopefully the results of some of that discussion are posted in the not distant future.
Modularity

I've seen Langdon give his Modularity talk 3 or 4 times now across various venues. Every time I watch it, I'm amazed at its ambition and relevance in today's trending software market. The nice thing about this session is that it continues to show progress towards actually doing this every time. This particular time around, Langdon was able to show off a few demos of installing various modules (a kernel and httpd). Live demos are always fun, and this was no exception. I think his team is doing great work, and I'm looking forward to seeing how this plays into the way we create Fedora in the future. There are a lot of details to work out and that's where the fun part will be.
PRD Workshop

I wound up, somewhat unexpectedly, moderating a PRD workshop for the Fedora Editions. The PRDs for each WG have existed for a few releases now, and some of them are woefully out of date. It was time to refresh what each WG was aiming for, and document that.

Rather than dig into the existing PRDs and revise from there, we started off by using the Kellogg Logic Model that Matthew Miller covered in his talk earlier in the day. We started with Impact. So we listed Mission and Vision statements, followed by Outcomes that map to those, followed by Outputs that would meet the desired outcomes. Stephen Gallagher started us off with Server and we spent quite a bit of time getting used to the ideas and flow of the model, and then drilling down into some of the details. Workstation was next with Christian Schaller and Alberto Ruiz leading the way.

Ironically, the WG that needs to address their PRD the most is the Cloud WG and we ran short on time to really dig into it in any meaningful way. The main thing we took away there was basically the WG is blowing itself up and doing a pivot to become the Atomic WG. That might sound drastic, but it was a direct follow on from the Cloud FAD that was held recently with the WG members. What that actually means for all the involved parties is up to them and I'm curious to see the end result.

Organization and Logistics


Venue

I thought the venue for Flock this year worked out very well. Having the conference in the hotel makes everything massively easier. People have less transit to worry about, the organizers only have one set of staff to work with, and it generally makes for a better conference in my opinion. I thought the facilities were great, the quality and amount of food was good, and even the wifi worked well as long as nobody used the floor outlets (... no idea). Brian did a fantastic job finding the Best Western for us.
Schedule

Having created the schedule for Flock this year I'm completely biased. However, I thought it worked out well. There were the inevitable gripes about conflicts between sessions people wanted to attend, but it is impossible to avoid that for everyone. I did tell the Fedora Engineering team that they have to write in their goals for next year to be less awesome and submit fewer interesting talks. They could have had a mini-conference of their own, and scheduling their talks were by far the hardest part.

The only concern I had from a schedule perspective is a significantly higher number of cancellations this year. We lost 10 sessions total for a variety of reasons. A typical year normally results in only two or three. The organizers will need to keep an eye on this to make sure it doesn't turn into a trend, particularly when we have a growing number of submissions overall.
Communications

The Flock booklets this year were great. Ryan and the design team did a superb job with them again. The only comment I had for Ryan was to eliminate the schedule from the booklet itself. We always get requests for new sessions on the Workshop days, or cancellations/changes after the print date and that makes it obsolete before the conference even starts. He did have a fantastic way to compensate for this though, and created a schedule on the main hallway wall every morning with printed out session tags. I thought this was brilliant and made it much easier for attendees to know what was coming next without having to bring up the sched.org page.

Online channels were a mixed bag this time around. Typically the bulk of the chat is done in the Flock IRC channels and the staff monitor those for things that might need attention. This year, a Telegram channel was created by the attendees as well. I think the attendees liked the Telegram channel, so much so that the bulk of chatter went there. That was somewhat problematic only in that it deviated from the past and made for one more place staff had to monitor despite us saying we weren't going to do that. The dilution wasn't severe though. Inevitably wherever the bulk of the people are is what becomes the main avenue for communication. I would advocate for coming up with an online discussion plan well in advance of the conference next year and sticking to it.

Streaming wound up having something in a similar vein. The wifi did hold up very well, but the staff didn't feel it would be sufficient to stream all the sessions. We went for recording instead, as we did last year, and told speakers that remote attendance or speaking via streaming wasn't something we were going to do. As is typical with any strong willed and creative community, this was ignored for the Diversity panel and they went and did it anyway. I was pleased to find out after the fact that it seemed to go well enough, and it didn't kill wifi for the whole conference. Kudos to them for getting it done, particular for such an important topic, but I don't think it would have worked for all the sessions at once.
The city

I cannot stress enough how wonderful the city itself was this year. The people of Krakow were friendly and approachable, the city was beautiful, and the evening events were great. I say that as someone that does not like boats, and I had a great time on our boat event (as I did last year after reluctantly going).
The community

As always, the biggest factor to making Flock a success is the Fedora community. It is rewarding to see people turn up ready to interact and contribute time and again. I'm always impressed with what people can get done there, and with the energy level they bring. It always leaves me looking forward to next year.

Time for an Alternative

I've been doing kernel development or maintenance for a large portion of my professional career. It started at my previous employer and continued for the past 5 years at Red Hat on the Fedora kernel team, the last 3 as the defacto team lead. It has been a fantastic experience, from both technical and people perspectives. I consider myself extremely lucky to have been a part of the Fedora kernel team, both past and present incarnations of it. Which is why when career discussions came up recently, it wasn't the easiest thing to think about.

I did some pondering about what I had done and what I wanted to do. I had some great conversations with a number of people within Red Hat. After a lot of discussion and thinking, I'm very happy to have joined the Fedora Alternative Architectures team. This team works with all of the various non-Intel architectures in Fedora and within Red Hat. It is a fantastic group of people, and the technical content fits perfectly with my background. I'm very excited to be able to move from one excellent team to another. The people within Red Hat continue to reinforce how rare our company is.

The Fedora kernel team is in great hands with Laura and Justin. I'm fully confident that they'll be able to continue providing a high quality kernel and kernel community for Fedora. Things shouldn't really change there on a day to day basis, and I'm not really going very far so don't be surprised to see me still poking around in the kernel package.

I'll still be heavily involved in Fedora day-to-day as well, participating in FESCo and the Fedora Council. So if you need to ping me about something Fedora and/or alternative architecture related, don't hesitate.

When is a kernel bug not a kernel bug?

Think of this scenario: You're sitting at your shiny Fedora install and notice a kernel update is available. You get all excited, update it through dnf or Gnome Software, or whatever you use, reboot and then things stop working. "STUPID KERNEL UPDATE WHY DID YOU BREAK MY MACHINE" you might say. Clearly it's at fault, so you dutifully file a bug against the kernel (you do that instead of just complaining, right?). Then you get told it isn't a kernel problem, and you probably think the developers are crazy. How can a kernel update that doesn't work NOT be a kernel problem?

This scenario happens quite often. To be sure, a good portion of the issues people run into with kernel updates are clearly kernel bugs. However, there is a whole set of situations where it seems that way but really it isn't. So what is at fault? Lots of stuff. How? Let's talk about the boot sequence a bit.

Booting: a primer


Booting a computer is a terribly hacky thing. If you want a really deep understanding of how it works, you should probably talk to Peter Jones[1]. For the purposes of this discussion, we're going to skip all the weird and crufty stuff that happens before grub is started and just call it black magic.

Essentially there are 3 main pieces of software that are responsible for getting your machine from power-on to whatever state userspace is supposed to be in. Those are grub, the kernel, and the initramfs. Grub loads the kernel and initramfs into memory, then hands off control to the kernel. The kernel does the remainder of the hardware and low-level subsystem init, uncompresses the initramfs and jumps into the userspace code contained within. The initramfs bootstraps userspace as it sees fit, mounting the rootfs and switching control to that to finish up the boot sequence. Seems simple.

The initramfs


So what is this "initramfs"? In technical terms, it's a weird version of a CPIO archive that contains a subset of userspace binaries needed to get you to the rootfs. I say weird because it can also have CPU microcode tacked onto the front of it, which the kernel strips off before unpacking it and applies during the early microcode update. This is a good thing, but it's also kind of odd.

The binaries contained within the initramfs are typically your init process (systemd), system libraries, kernel modules needed for your hardware (though not all of them), firmware files, udev, dbus, etc. It's almost equivalent to the bare minimum you can get to a prompt with. If you want to inspect the contents for yourself, the lsinitrd command is very handy.

There are actually a couple of different 'flavors' of initramfs as well. The initramfs found in the install images is a generic initramfs that has content which should work on the widest variety of machines possible, and can be used as a rescue mechanism. It tends to be large though, which is why after an install the initramfs is switched to HostOnly mode. That means it is specific to the machine it is created on. The tool that creates the initramfs is called dracut, and if you're interested in how it works I would suggest reading the documentation.

The problems


OK, so now that we have the components involved, let's get to the actual problems that look like kernel bugs but aren't.

Cannot mount rootfs


One of the more common issues we see after an update is that the kernel cannot mount the rootfs, which results in the system panicking. How does this happen? Actually, there are a number of different ways. A few are:

* The initramfs wasn't included in the grub config file for unknown reasons and therefore wasn't loaded.
* The initramfs was corrupted on install.
* The kernel command line specified in the grub config file didn't include the proper rootfs arguments.

All of those happen, and none of them are the kernel's fault. Fortunately, they tend to be fairly easy to repair but it is certainly misleading to a user when they see the kernel panic.

A different update breaks your kernel update


We've established that the initramfs is a collection of binaries from the distro. It's worth clarifying that these binaries are pulled into the initramfs from what is already installed on the system. Why is that important? Because it leads to the biggest source of confusion when we say the kernel isn't at fault.

Fedora tends to update fast and frequently across the entire package set. We aren't really a rolling release, but even within a release our updates are somewhat of a firehose. That leads to situations where packages can, and often do, update independently across a given timeframe. In fact, the only time we test a package collection as a whole is around a release milestone (keep reading for more on this). So let's look at how this plays out in terms of a kernel update.

Say you're happily running a kernel from the GA release. A week goes by and you decided to update, which brings in a slew of packages, but no kernel update (rinse and repeat this N times). Finally, a kernel update is released. The kernel is installed, and the initramfs is built from the set of binaries that are on the system at the time of the install. Then you reboot and suddenly everything is broken.

In our theoretical example, let's assume there were lvm updates in the timeframe between release and your kernel update. Now, the GA kernel is using the initramfs that was generated at install time of the GA. It continues to do so forever. The initramfs is never regenerated automatically for a given kernel after it is created during the kernel install transaction. That means you've been using the lvm component shipped with GA, even though a newer version is available on the rootfs.

Again, theoretically say that lvm update contained a bug that made it not see particular volumes, like your root volume. When the new kernel is installed, the initramfs will suck in this new lvm with the bug. Then you reboot and suddenly lvm cannot see your root volume. Except it is never that obvious and it just looks like a kernel problem. Compounding that issue, everything works when you boot the old kernel. Why? Because the old kernel initramfs is still using the old lvm version contained within it, which doesn't have the bug.

This problem isn't specific to lvm at all. We've seen issues with lvm, mdraid, systemd, selinux, and more. However, because of the nature of updates and the initramfs creation, it only triggers when that new kernel is booted. This winds up taking quite a bit of time to figure out, with a lot of resistance (understandably) from users that insist it is a kernel problem.

Solution: ideas wanted


Unfortunately, we don't really have a great solution to any of these, particularly the one mentioned immediately above. People have suggested regenerating the initramfs after every update transaction, but that actually makes the problem worse. It takes something known to be working and suddenly introduces the possibility that it breaks.

Another solution that has been suggested is to keep the userspace components in the initramfs fixed, and only update to include newer firmware and modules. This sounds somewhat appealing at first, but there are a few issues with it. The first is that the interaction between the kernel and userspace isn't always disjoint. In rare cases, we might actually need a newer userspace component (such as the xorg drivers) to work properly with a kernel rebase. Today that is handled via RPM Requires, and fixing the initramfs contents cannot take that into account. Other times there may be changes within the userspace components themselves that mean something in the initramfs cannot interact with an update on the rootfs. That problem also exists in the current setup as well, but switching from today's known scenarios to a completely different setup while still having that problem doesn't sound like a good idea.

A more robust solution would be to stop shipping updates in the manner in which they are shipped in Fedora. Namely, treat them more like "service packs" or micro-releases that could be tested as a whole. Indeed, Fedora Atomic Host very much operates like this with a two week release cadence. However, that isn't prevalent across all of our Editions (yet). It also means individual package maintainers are impacted in their workflow. That might not be a bad thing in the long run, but a change of that proportion requires time, discussion, and significant planning to accomplish. It also needs to take into account urgent security fixes. All of that is something I think should be done, but none of it guarantees we solve these kinds of "kernel-ish" problems.

So should we all despair and throw up our hands and just live with it? I don't think so, no. I believe the distro as a whole will eventually help here, and in the meantime hopefully posts like this provide a little more clarity around how things work and why they may be broken. At the very least, hopefully we can use this to educate people and make the "no, this isn't a kernel problem" discussions a bit easier for everyone.


[1] It should be noted that Peter might not actually want to talk to you about it. It may bring up repressed memories of kludges and he is probably busy doing other things.

(no subject)

Often we get bugs reported against the Fedora kernel for issues involving third party drivers. Sometimes those are virtualbox, sometimes VMWare guest tools, but most often it is the nvidia driver. We had another reported today. I'll pause to let you read it. Go ahead, read the whole thing. I'll wait.

https://bugzilla.redhat.com/show_bug.cgi?id=1335173

Done? Good. You'll notice a couple things. First, it's closed as CANTFIX with a rote comment that we typically use for such bugs. That comment, while terse, is not incorrect. Second, the reporter is, frankly, pissed off. You know what?

He has every right to be.

So if the response in the bug is not incorrect, how does that line up with the assertion that the reporter's anger isn't wrong either? I thought I'd spend some time breaking this bug down in detail to try and explain it.

The crux of the reporters argument is that using the nvidia driver on Fedora causes pain to Fedora's users. I'm not going to argue against that. Using the nvidia driver on Fedora is very much painful. A user can finally get it working, and then we rebase the kernel and it breaks again for them. That isn't a good user experience at all. We've know this for a while and there are some tentative plans to help users in this situation by defaulting to a known working kernel if they have the nvidia driver installed. That doesn't fix the problem, but it at least reduces the element of surprise.

The reporter then goes on to make some assertions that might seem plausible, but in fact aren't accurate at all. Let's look at these more closely.

The claims


You do not intentionally break hardware compatibility? Oh, wait, you do.

We do not intentionally break anything. As we've written about before, we rebase the kernel to pick up the bugfixes the upstream maintainers are including in those newer releases. However, an additional benefit of those rebases is that we actively enable more new hardware by doing so. Yes, there are regressions and they are particularly prevalent if you are relying on out-of-tree drivers. Those regressions are unfortunate, but certainly not done out of malice.

You do not intentionally break API/ABI compatibility? Oh, wait, you do.

Greg-kh has talked and written extensively about the fact that the upstream kernel has no stable API. In fact, his document is included in the kernel source itself. Due to the fact that the kernel has no stable API, there is also no stable ABI. Now, it should be noted that ABI here is describing the ABI between the kernel and modules, not the ABI between the kernel and userspace. The kernel/userspace ABI is done via syscalls and that is stable and fanatically protected by the upstream kernel maintainers. However, modules don't have that luxury and therefore when a rebase is done the ABI can and does change. Include the fact that compiler versions change in Fedora, which can impact ABI, and it becomes evident that the reporter's claim is somewhat true.

We could freeze on a kernel version and attempt to keep the API/ABI stable, but that incurs a significant maintenance cost that our small team is not able to handle. Even the RHEL kernel, with it's much larger user base and development team, has a limited kABI they support. So yes, the API/ABI changes with a rebase (or more rarely with a new stable update), but that is one of the consequences of doing a rebase. It is not done with the intention of breaking anything purposefully.

You do not limit user's choice in regard to running software or drivers? Oh, wait, you do.

Fedora actually does not limit the user's choice in software. The user is free to install the nvidia driver or whatever other software they wish to use. Google Chrome, Lotus Notes, Steam, nvidia drivers, and other software not provided by Fedora has all been known to be installed and work. There is no problem with a user choosing to do this. It is their own computer!

What the Fedora kernel team cannot do is provide support for such software. As Justin mentions in the bug itself, providing such support is very difficult for us to do. We have no access to the driver source and cannot fix bugs in the driver. If there is a bug in the kernel itself that the driver happens to trigger, we lack a lot of context around what the driver is doing to cause. It is simply not a tenable position. Therefore we close all such bugs as CANTFIX.

You make sure your software is bugs free? Oh, wait, you don't.

This one is starting to leave reality of software in general, in that no software is ever bug free. We do, however, attempt to ensure we don't ship with known bugs. Also, the kernel Fedora ships is very close to the upstream kernel and 95% of the bugs reported are present in the upstream kernel as well. So "your" software here is collectively the entire kernel community.

You make sure you have the most stringent QA/QC process? Oh, wait, you don't.

I will not argue that our QA process is the most stringent. I won't even argue that it is more stringent than some other project's. It is, however, constantly improving. We've had an automated testsuite in place for more than a year now to test builds as they come out of koji. We continue to run tests on the kernel manually on a variety of machines to make sure things are not known to be broken on certain configurations. We are constantly looking to add more to this in as automated of a fashion as we can.

However, that only scales so far. Particularly in the case of the kernel, Fedora relies heavily on input from actual users via our updates-testing and bodhi infrastructure. Consider this a continued plea for help testing and catching things.

(paraphrased) The nouveau driver is slower, less stable, and unusable on recent nvidia hardware

Quite simply, there is a lot of truth to this statement particularly for accelerated graphics situations (which gnome-shell uses). However, this is not Fedora's fault or really anyone's fault. The nouveau driver is a reverse engineered solution that is continually improving despite having very few actual developers working on it. We've recognized this gap and there are more people assigned to nouveau now than ever before. Progress will be slow, but it is being made. As Justin mentions, support for newer nvidia cards should continue to improve in the 4.6 and 4.7 kernels now that some of the signed firmware issues are worked out.

(paraphrased) Fedora expects everyone to use Intel GPUs

We have no such expectations. Intel does have more market penetration due to the on-board GPUs it ships with its newer CPUs, and they have a development team working on their open source driver directly upstream. The i915 driver is held up as the ideal standard for open source GPU drivers, but it is not bug free by any means. The radeon driver is similarly open source and typically works well but it also is not bug free. GPU work is hard. At least those vendors are doing it in the open, and they should be applauded for it. That does not mean usage of their product is required.

(Personally, I use both Intel and ATI GPUs in my two primary machines.)

(paraphrased) Fedora sabotages its kernels to make it incompatible with third party drivers.

This is blatantly false. We do not add any patches to intentionally break anything. That would be untrue to Fedora's Foundations, limiting our users for no reason at all, and I would personally find it immoral.

The only limitation Fedora places on third party modules is under the Secure Boot case in order to fully support that mechanism. They must be signed with a cert that is imported in the kernel, and we have provided documentation on how to do this or disable Secure Boot for those wishing to not bother with it.

So now what


There is no good answer here. While I dislike the more inflammatory claims, personal attacks, and overall tone in the bug, I stand by my assertion that the reporter has every right to be angry. They simply want to use the hardware they have purchased. I think the anger is misdirected somewhat, but hopefully this post illustrates that there are two sides to every story and elaborates on why it isn't as simple as many make it out to be.

Fedora knows people want to use things that give them the best performance or user experience. We aren't actively trying to prevent either of those. We try and balance the needs of as many users as we can. In this specific case, we're also looking at ways to improve the user experience without compromising Fedora's stance on proprietary software. Unfortunately, there are situations where things will break and we simply cannot fix them. Please continue to tell us about them so we're aware. Perhaps with more understanding and less vitriol in the future.

This post reflects the author's opinion and is not necessarily representative of the Fedora project or the author's employer.

4.4.y kernel and XFS. Sometimes being Fast and First is hard

TLDR: Update xfsprogs to xfsprogs-4.3.0-1 before you update the kernel to 4.4.4 on Fedora 22/23.

Every once in a while we get a bug report that sounds really really bad. So bad that it makes us do a double take and start thinking about it in detail even before we complete the normal daily bug review. Today was one of those days.

This morning I read a bug report that suggested the 4.4 kernel will not boot with an xfsprogs in userspace less than version 4.3.0. That sounds pretty bad. Fedora 23 has already been rebased to the 4.4.y stable series of kernels, and it is in update-testing for Fedora 22. Given that the default filesystem in Server edition is XFS, that definitely caused some concern.

However, thinking about it even for a moment it seemed somewhat odd. First, typically the only userspace component involved in mounting an XFS filesystem is mount(8). That isn't provided by the xfsprogs package. Even if that is what was broken, one would think that there would have been bug reports long before now given that Fedora 23 was rebased a while ago already. Perhaps fsck might come into play too, and that's a possibility but it isn't very likely. So it was time to dig in further. The report linked to an upstream discussion. Here is where things started to get clearer. I'll try and explain a bit.

Around the 3.15 kernel release, XFS added support for maintaining CRC checksums for on-disk metadata objects. According to the mkfs.xfs manpage:


CRCs enable enhanced error detection due to hardware issues, whilst the format changes also improves crash recovery algorithms and the ability of various tools to validate and repair metadata corruptions when they are found. The CRC algorithm used is CRC32c, so the overhead is dependent on CPU archi‐ tecture as some CPUs have hardware acceleration of this algorithm. Typically the overhead of calculat‐ ing and checking the CRCs is not noticable in normal operation.


Yay! More robustness and better error recovery. Fantastic. However, features like this are rarely enabled by default when they are first released. In fact, the addition of this necessitated bumping the version number of the XFS on-disk format to version 5. For further caution, xfsprogs didn't start supporting this until xfsprogs-3.2.0 and didn't make it the default for mkfs.xfs until xfsprogs-3.2.3. Pretty cautious on the part of the XFS developers. But that still leaves us trying to figure out why the report was submitted and what it means for Fedora.

First, the actual problem. In the 4.4 kernel release the XFS developers started validating some additional metadata of V5 filesystems against the log. This is fine for healthy filesystems. However, if you had a crash and ran xfs_repair from an older xfsprogs package, it would unconditionally zero out the log and the kernel would be very confused and report corruption when you tried to mount it. The xfsprogs-4.3 release fixed this in the xfs_repair utility. OK, so there's validity to the 4.4 kernel needing xfsprogs-4.3, but only in certain circumstances. Namely, you have to have a V5 filesystem on disk, and you have to run xfs_repair from an older xfsprogs against it.

Fortunately, that isn't a super common situation. For Fedora 22, we released with xfsprogs-3.2.2. That means any new XFS filesystems created from installation media should still be V4 and not hit this issue (prior EOL Fedora releases are in the same boat). It's possible someone manually specified crc=1 when they created their F22 XFS partition, but that is a rare case. Fedora 23, on the other hand, shipped with xfsprogs-3.2.4 and should be creating V5 XFS filesystems by default. Those users would still need to run xfs_repair, which isn't a massively common thing to do. Unfortunately, it is common enough we need to do something about it.

The first thing to do was get xfsprogs updated in Fedora 22 and 23 (24 and rawhide are fine already). Eric Sandeen, being the massively awesome human being he is, had that done before I had even fully understood the problem. He was, in fact, so fast that he had the bodhi updates file before we even talked about what to do in the kernel package. That isn't ideal, but it's workable.

The second thing to do was to make sure the kernel package took this into account. Originally we discussed adding a Requires on the appropriate xfsprogs version. That actually isn't a great idea though. The kernel package doesn't actually Require any of the filesystem creation tools packages and it shouldn't. Users don't want to be forced to drag around xfsprogs if they aren't even using XFS filesystems, or btrfs-progs if they aren't using btrfs, etc. So we quickly realized that we needed to use Conflicts.

For the 4.4.4 kernel updates in Fedora 22 and Fedora 23, we added the Conflicts for xfsprogs < 4.3.0-1. You may see output that looks similar to this when you dnf update:

Error: package kernel-core-4.4.4-301.fc23.x86_64 conflicts with xfsprogs < 4.3.0-1 provided by xfsprogs-3.2.4-1.fc23.x86_64.

The solution here is to make sure you update xfsprogs first. Then your kernel update should work fine.

We'd like to thank the original bug reporter and Eric Sandeen for their help.

[Edit: Added TLDR for people that don't like to read and/or hate my writing]

Understanding the kernel release cycle

A few days ago, a Fedora community member was asking if there was a 4.3 kernel for F23 yet (there isn't). When pressed for why, it turns out they were asking for someone that wanted newer support but thought a 4.4-rc3 kernel was too new. This was surprising to me. The assumption was being made that 4.4-rc3 was unstable or dangerous, but that 4.3 was just fine even though Fedora hasn't updated the release branches with it yet. This led me to ponder the upstream kernel release cycle a bit, and I thought I would offer some insights as to why the version number might not represent what most people think it does.

First, I will start by saying that the upstream kernel development process is amazing. The rate of change for the 4.3 kernel was around 8 patches per hour, by 1600 developers, for a total of 12,131 changes over 63 days total[1]. And that is considered a fairly calm release by kernel standards. The fact that the community continues to churn out kernels of such quality with that rate of change is very very impressive. There is actually quite a bit of background coordination that goes on between the subsystem maintainers, but I'm going to focus on how Linus' releases are formed for the sake of simplicity for now.

A kernel release is broken into a set of discrete, roughly time-based chunks. The first chunk is the 2 week merge window. This is the timeframe where the subsystem maintainers send the majority of the new changes for that release to Linus. He takes them in via git pull requests, grumbles about a fair number of them, refuses a few others. Most of the pull requests are dealt with in the first week, but there are always a few late ones so Linus waits the two weeks and then "closes" the window. This culminates in the first RC release being cut.

From that point on, the focus for the release is fixes. New code being taken at this point is fairly rare, but does happen in the early -rc releases. These are cut roughly every Sunday evening, making for a one week timeframe per -rc. Each -rc release tends to be smaller in new changesets than the previous, as the community becomes much more picky on what is acceptable the longer the release has gone on. Typically it gets to -rc7, but occasionally it will go to -rc8. One week after -rc7 is released, the "final" release is cut, which maps nicely with the 63 day timeframe quoted above.

Now, here is where people start getting confused. They see a "final" release and immediately assume it's stable. It's not. There are bugs. Lots and lots of bugs. So why would Linus release a kernel with a lot of bugs? Because finding them all is an economy of scale. Let's step back a second into the development and see why.

During the development cycle, people are constantly testing things. However, not everyone is testing the same thing. Each subsystem maintainer is often testing their own git tree for their specific subsystem. At the same time, they've opened their subsystem trees for changes for the next version of the kernel, not the one still in -rcX state. So they have new code coming in before the current code is even released. This is how they sustain that massive rate of change.

Aside from subsystem trees, there is the linux-next tree. This is daily merge of all the subsystem maintainer trees that have already opened up to new code on top of whatever Linus has in his tree. A number of people are continually testing linux-next, mostly through automated bots but also in VMs and running fuzzers and such. In theory and in practice, this catches bugs before they get to Linus the next round. But it is complicated because the rate of change means that if an issue is hit, it's hard to see if it's in the new new code only found in linux-next, or if it's actually in Linus' tree. Determining that usually winds up being a manual process via git-bisect, but sometimes the testing bots can determine the offending commit in an automated fashion.

If a bug is found, the subsystem maintainer or patch author or whomever must track which tree the bug is, whether it's a small enough fix to go into whatever -rcX state Linus' tree is in, and how to get it there. This is very much a manual process, and often involves multiple humans. Given that humans are terrible at multitasking in general, and grow ever more cautious the later the -rcX state is, sometimes fixes are missed or simply queued for the next merge window. That's not to say important bugs are not fixed, because clearly there are several weeks of -rcX releases where fixing is the primary concern. However, with all the moving parts, you're never going to find all the bugs in time.

In addition to the rate of change/forest of trees issue, there's also the diversity and size of the tester pool. Most of the bots test via VMs. VMs are wonderful tools, but they don't test the majority of the drivers in the kernel. The kernel developers themselves tend to have higher end laptops. Traditionally this was of the Thinkpad variety and a fair number of those are still seen, but there is some variance here now which is good. But it isn't good enough to cover all possible firmware, device, memory, and workload combinations. There are other testers to be sure, but they only cover a tiny fraction of the end user machines.

It isn't hard to see how bugs slip through, particularly in drivers or on previous generation hardware. I wouldn't even call it a problem really. No software project is going to cut a release with 0 bugs in it. It simply doesn't happen. The kernel is actually fairly high quality at release time in spite of this. However, as I said earlier, people tend to make assumptions and think it's good enough for daily use on whatever hardware they have. Then they're surprised when it might not be.

To combat this problem, we have the upstream stable trees. These trees backport fixes from the current development kernels that also apply to the already released kernels. Hence, 4.3.1 is Linus' 4.3 release, plus a number of fixes that were found "too late". This, in my opinion, is where the bulk of the work on making a kernel usable happens. It is also somewhat surprising when you look at it.

The first stable release of a kernel, a .1 release, is actually very large. It is often comprised of 100-200 individual changes that are being backported from the current development kernel. That means there are 100-200 bugs immediately being fixed there. Whew, that's a lot but OK maybe expected with everything above taken into account. Except the .2 release is often also 100-200 patches. And .3. And often .4. It isn't until you start getting into .5, .6, .7, etc that the patch count starts getting smaller. By the .9 release, it's usually time to retire the whole tree (unless it's a long-term stable) and start the fun all over again.

In dealing with the Fedora kernel, the maintainers take all of this into account. This is why it is very rare to see us push a 4.x.0 kernel to a stable release, and often it isn't until .2 that you see a build. For those thinking that this article is somehow deriding the upstream kernel development process, I hope you now realize the opposite is true. We rely heavily on upstream following through and tagging and fixing the issues it finds, either while under development or via the stable process. We help the stable process as well by reporting back fixes if they aren't already found.

So hopefully next time you're itching for a new kernel just because it's been released upstream, you'll pause and think about this. And if you really want to help, you'll grab a rawhide kernel and pitch in by reporting any issues when you find them. The only way to get the stable kernel releases smaller, and reduce the number of bugs still found in freshly released kernels, is to broaden the tester pool and let the upstream developers know as soon as possible. In this way, we're all part of the upstream kernel community and we can all keep making it awesome and impressive.

(4.3.y will likely be coming to F23 the first week of January. Greg-KH seems to have gone on some kind of walkabout the past few weeks so 4.3.1 hasn't been released yet. To be honest, it's a break well deserved. Or maybe he found 4.3 to be even more buggy as usual. Who knows!)

[1] https://lwn.net/Articles/661978/ (Of course I was going to link to lwn.net. If you aren't already subscribed to it, you really should be. They have amazing articles and technical content. They make my stuff look like junk even more than it already is. I'm kinda jealous at the energy and expertise they show in their writing.)

Fedora kernel exploded tree part deux: Snakes and assumptions

A while back I wrote about some efforts to move to using an exploded source tree for the Fedora kernel. As that post details, it wasn't the greatest experience. However, I still think an exploded tree has good utility and I didn't want to give up on the idea of it existing. So after scraping our "switch" I decided to (slowly) work on a tool that would create such a tree automatically. In the spirit of release-early and pray nobody dies from reading your terrible code, we now have fedkernel. The readme file in the repo contains a high level overview of how the tool works, so I won't duplicate that here. Instead I thought I would talk about some of the process changes and decisions we made to make this possible.

Git all the things


One of the positive fallouts of the previous efforts was that all of the patches we carried in Fedora were nicely formatted with changelogs and authorship information. Being literally the output of git-format-patch instantly improved the patch quality. When it came time to figure out how to generate the patches from pkg-git to apply to the exploded tree, I really wanted to keep that quality. So I thought about how to accomplish this and then I realized there was no need to reinvent the wheel. The git-am and git-format-patch tools existed and were exactly what I wanted.

After discussing things with the rest of the team, we switched to using git-am to apply patches in the Fedora kernel spec. The mechanics of this are pretty simple: the spec unpacks the tarball (plus any -rcX patches) and uses this as the "base" commit. Stable update patches are applied as a separate commit on top of the base if it is a stable kernel. Then it walks through every patch and applies it with git-am. This essentially enforces our patch format guidelines for us. It does have the somewhat negative side effect of slowing down the %prep section quite a bit, but in practice it hasn't been slow enough to be a pain. (Doing a git add and git commit on the full kernel sources isn't exactly speedy, even on an SSD.)

So after %prep is done, the user is left with a git tree in the working directory that has all the Fedora patches as separate commits. "But wait, isn't the job done then?", you might ask. Well, no. We could call it good enough, but that isn't really what I or other users of an exploded tree were after. I wanted a tree with the full upstream commit history plus our patches. What this produces is just a blob, plus our patches. Not quite there yet but getting closer.

Snakes


This is where fedkernel comes in. I needed tooling that could take the patches from this franken-tree and apply them to a real exploded git tree. My previous scripts were written in bash, and I could have done this in bash again but I wanted to make it automated and I wanted it to talk to the Fedora infrastructure. This means it was time to learn python again. Fortunately, the upstream python community has great documentation and there are modules for pretty much anything I needed. This makes my "bash keyboard in interactive python session" approach to the language pretty manageable and I was able to make decent progress.

To get the patches out of the prepped sources, I needed to know mainly one thing. What was the actual upstream base for this build? That is easy enough to figure out if you can parse certain macros in kernel.spec. The one part that proved to be somewhat difficult was for git snapshot kernels. We name these -gitY kernels, where X increases until the next -rcX release. E.g. kernel-4.3.0-0.rc3.git1.1, kernel-4.3.0-0.rc3.git2.1, etc. That's great for RPMs, but the only place we actually documented what upstream commit we generated the snapshot from was in an RPM %changelog comment.

Parsing it out of there is possible, but it's a bit cumbersome and it is somewhat error prone. The sha1sum is always recorded, but it isn't guaranteed to be the newest changelog. Other patches and changelogs can be added before the kernel is actually built. Fortunately, we use a script to generate these snapshots. To make it trivial to figure out the sha1sum, I modified the script to record the full commit has to a file called gitrev in pkg-git. Now fedkernel can easily read that file and use it as the base upstream revision to apply patches on top of. Yay for cheating and/or being lazy.

The rest of the code deals with prepping the tree, using the git python module to do manipulations and generate patches, and applying them to the other tree. Python actually made much of this very easy to do and again I'm really glad I used that instead of bash.

Assumptions



So now that we modified a few things in pkg-git to make this easier, the assumptions basically fall out to be:

- Patches in the prepped source can be retrieved via 'git format-patch'
- The upstream base revision is determinable from kernel.spec and the gitrev file.

Pretty simple, right? Yes. Except the code isn't complete by any means and it requires a bit more manual setup. Like having existing pkg-git and linux.git trees that it can modify which contain all the branches and proper remotes set up already. That isn't really a huge issue, but it does mean when f24 is branched and rawhide becomes f25, we'll need to do some updates. Perhaps before then we'll have fixed the code (or some nice person will submit a pull request that does so.)

I've been using the tool to generate exploded trees for the past week or so. It seems to be working well, and I've published them at https://git.kernel.org/cgit/linux/kernel/git/jwboyer/fedora.git/ once again. There is a history gap there as the tree fell into disrepair for a while, but it should be kept current going forward.

Even if the code is terrible and hacky, writing it was a good learning experience. I hope to keep refining it and improving things in the TODO over time. If you want to pitch in, patches are always welcome. You can email them to me, submit a pagure.io pull request, or mail the Fedora kernel list as usual.

A word on Fedora meetings

A few people have noted that when I chair a Fedora meeting, I seem to move quickly and remain strictly focused on the current set topic. I discourage broader discussion on semi-related topics and tend to finish meetings faster than most. There is a reason for this. It is because THAT IS HOW MEETINGS ARE SUPPOSED TO WORK.

There are many many articles about productivity and meetings. I am not an expert on this at all, but I do strictly follow my own set of rules for meetings derived partly from said articles but mostly from experience. They can be summed up as so:

1) The meeting better have an agenda. If it doesn't, I'm likely to not pay attention. The agenda should lend itself to having items decided upon and completed during said meeting. If there are hairy topics, the should be last. The vast majority of discussion of agenda items should have already taken place elsewhere, either on lists or in tickets.

2) The meeting should actually STICK to the agenda. A meeting is really not a place to bring up random or tangential topics for discussion. At most such things should be noted for further discussion and decision at future meetings, preferably during open floor.

3) The chair of the meeting is responsible for keeping people on topic and completing the meeting in a timely manner. Be polite, but firm.

4) If it is clear that a decision is not going to be reached on an item during the meeting, defer it for further discussion elsewhere.


That's it.

Meetings should be focused with clear goals. Fedora meetings taking place on IRC necessitates some amount of leeway because of the medium, but that does not mean meetings exist for the sake of meetings or for chat time. IRC is also a terrible place to have in-depth discussions on things, as people feel time pressured and the format of the dialogue can be difficult to follow.

So yes, my FESCo meetings (or any other that I chair) tend to be shorter than most. This is simply because I want the meeting to be productive and I don't want to waste people's time. And if you feel that I am wrong on several points here, that is totally fine with me. Just volunteer to chair the meeting next time ;).

LPC 2015 Day 2

I started day 2 of LPC with the Graphics, mode setting, and Wayland microconference. This was an extremely dense microconference with lots of information on what is missing in various areas and what has been in the works for some time. To be honest, I felt out of my depth several times. However, it was very clear that the participants in the session were making good progress. One side-note: nobody likes to use mics :).

The afternoon session was heavy in hallway track for me. I had some conversations with Josef Bacik around btrfs and Fedora (summary: not yet). I also spoke with him and another engineer from Facebook around how they handle the kernel across their various machines. It was an interesting high level look at what a company is doing at that scale.

I also touched base with Matthew Garrett on the secure boot patchset. This has been something that we've carried in Fedora since around Fedora 18. It hasn't really changed much at all, but the patches have failed to get upstreamed for a number of trivial reasons. There are a few follow-ups that need to be looked into as well though. Matthew plans on submitting them upstream again and I'm going to see what I can do to help them get merged. Hopefully the third (fourth?) time is a charm.

The remainder of the afternoon was spent catching up with some old colleagues and meeting a few new people from various companies. Conversation was wide ranging. We touched on technical topics as well as completely irrelevant things. However, I think the time was very well spent as one of the major purposes of conferences is to meet up with people face to face. Without that interaction, it is too easy to resolve a person to nothing more than an email address and that never works out well.

The evening event was held at Rock Bottom Brewery and it was delicious. We had some impromptu Fedora kernel team time over dinner and shared dessert. Once all the ice cream and brownie was gone, I turned in shockingly early. Apparently the jet lag and two late nights in a row were starting to wear on me.

LPC 2015 Day 1

I have the privilege of attending the Linux Plumbers Conference in Seattle this week. This is by far my favorite conference. The talks and microconferences are all of high quality and the events are very well organized.

The first day is shared with LinuxCon, which lends itself to some talks that span both audiences. The first talk I attended with "Everything is a file descriptor" by Josh Triplett. Josh gave a great overview of why file descriptors are a great mechanism and some of the more recent system calls that have been added to give userspace the ability to write less crappy code. Things like timerfd, and signalfd allow userspace to avoid some of the awkwardness that comes with using the more traditional interfaces that UNIX has provided. He also described work on clonefd, which is a new system call to allow userspace to get a file descriptor tied to a task in the kernel. This will have some interested benefits, such as being able to pass the fd to a process that is not in your process hierarchy and being able to poll for child/task exit without hanging in waitpid. At the end Josh threw out some possible future additions that haven't been implemented. Overall a very well done talk.

Following that, I sat in on Steven Rostedt's talk on the RT patchset. I've seen this talk or a version of it around three times and I'm always highly entertained. Steven likes to pack a ton of information in his talks. The overall success of the RT patchset has been very good, and they are down to some of the final really hard bits. Some of them they actually need help with to figure out the best solution in the subsystem in question (like the VFS). One of the questions from the audience was about lessons learned working on the patchset, the issues they've fixed in mainline because of it, etc. Steven said that is a great idea for another talk, so I hope he writes that up. I would love to hear it.

THe afternoon session started off with an overview of ACPI and where some of the ACPI 6 features are coming into play. Things like low power idle and other PM related features look to be headed towards us in hardware, and ACPI is of course being adapted to work with this. Overall a good overview.

Following that I did a bit of hallway track and then went to Daniel Vetter's talk on screwing up (or how to not) ioctls. Working in the DRM layer has provided him with lots of experience over the past few years on how to properly design your code to make it easier to use. One of the main points that was stressed several times was having testcases for everything. This is fairly obvious, but he pointed out that the testcases are better written when you aren't looking at the code. If you just look at the data structures and create boundary cases based on that, with all kinds of crazy input, you will often catch cases that the code itself doesn't cover. He had a good amount to say and it was entertaining.

The traditional kernel panel was the wrap up for the day, but I skipped that to catch up on some work. Then it was off to the evening event at the Experience Music Project museum. This venue was amazing and had a variety of exhibits ranging from a guitar collection to Star Wars costumes. It was very cool and the food was excellent. After spending perhaps a bit more than I expected in the gift shop, it was back to the hotel to try and sleep off some of the weariness that comes from travel delays and cramming information into your head all day.