May 11th, 2016

(no subject)

Often we get bugs reported against the Fedora kernel for issues involving third party drivers. Sometimes those are virtualbox, sometimes VMWare guest tools, but most often it is the nvidia driver. We had another reported today. I'll pause to let you read it. Go ahead, read the whole thing. I'll wait.

Done? Good. You'll notice a couple things. First, it's closed as CANTFIX with a rote comment that we typically use for such bugs. That comment, while terse, is not incorrect. Second, the reporter is, frankly, pissed off. You know what?

He has every right to be.

So if the response in the bug is not incorrect, how does that line up with the assertion that the reporter's anger isn't wrong either? I thought I'd spend some time breaking this bug down in detail to try and explain it.

The crux of the reporters argument is that using the nvidia driver on Fedora causes pain to Fedora's users. I'm not going to argue against that. Using the nvidia driver on Fedora is very much painful. A user can finally get it working, and then we rebase the kernel and it breaks again for them. That isn't a good user experience at all. We've know this for a while and there are some tentative plans to help users in this situation by defaulting to a known working kernel if they have the nvidia driver installed. That doesn't fix the problem, but it at least reduces the element of surprise.

The reporter then goes on to make some assertions that might seem plausible, but in fact aren't accurate at all. Let's look at these more closely.

The claims

You do not intentionally break hardware compatibility? Oh, wait, you do.

We do not intentionally break anything. As we've written about before, we rebase the kernel to pick up the bugfixes the upstream maintainers are including in those newer releases. However, an additional benefit of those rebases is that we actively enable more new hardware by doing so. Yes, there are regressions and they are particularly prevalent if you are relying on out-of-tree drivers. Those regressions are unfortunate, but certainly not done out of malice.

You do not intentionally break API/ABI compatibility? Oh, wait, you do.

Greg-kh has talked and written extensively about the fact that the upstream kernel has no stable API. In fact, his document is included in the kernel source itself. Due to the fact that the kernel has no stable API, there is also no stable ABI. Now, it should be noted that ABI here is describing the ABI between the kernel and modules, not the ABI between the kernel and userspace. The kernel/userspace ABI is done via syscalls and that is stable and fanatically protected by the upstream kernel maintainers. However, modules don't have that luxury and therefore when a rebase is done the ABI can and does change. Include the fact that compiler versions change in Fedora, which can impact ABI, and it becomes evident that the reporter's claim is somewhat true.

We could freeze on a kernel version and attempt to keep the API/ABI stable, but that incurs a significant maintenance cost that our small team is not able to handle. Even the RHEL kernel, with it's much larger user base and development team, has a limited kABI they support. So yes, the API/ABI changes with a rebase (or more rarely with a new stable update), but that is one of the consequences of doing a rebase. It is not done with the intention of breaking anything purposefully.

You do not limit user's choice in regard to running software or drivers? Oh, wait, you do.

Fedora actually does not limit the user's choice in software. The user is free to install the nvidia driver or whatever other software they wish to use. Google Chrome, Lotus Notes, Steam, nvidia drivers, and other software not provided by Fedora has all been known to be installed and work. There is no problem with a user choosing to do this. It is their own computer!

What the Fedora kernel team cannot do is provide support for such software. As Justin mentions in the bug itself, providing such support is very difficult for us to do. We have no access to the driver source and cannot fix bugs in the driver. If there is a bug in the kernel itself that the driver happens to trigger, we lack a lot of context around what the driver is doing to cause. It is simply not a tenable position. Therefore we close all such bugs as CANTFIX.

You make sure your software is bugs free? Oh, wait, you don't.

This one is starting to leave reality of software in general, in that no software is ever bug free. We do, however, attempt to ensure we don't ship with known bugs. Also, the kernel Fedora ships is very close to the upstream kernel and 95% of the bugs reported are present in the upstream kernel as well. So "your" software here is collectively the entire kernel community.

You make sure you have the most stringent QA/QC process? Oh, wait, you don't.

I will not argue that our QA process is the most stringent. I won't even argue that it is more stringent than some other project's. It is, however, constantly improving. We've had an automated testsuite in place for more than a year now to test builds as they come out of koji. We continue to run tests on the kernel manually on a variety of machines to make sure things are not known to be broken on certain configurations. We are constantly looking to add more to this in as automated of a fashion as we can.

However, that only scales so far. Particularly in the case of the kernel, Fedora relies heavily on input from actual users via our updates-testing and bodhi infrastructure. Consider this a continued plea for help testing and catching things.

(paraphrased) The nouveau driver is slower, less stable, and unusable on recent nvidia hardware

Quite simply, there is a lot of truth to this statement particularly for accelerated graphics situations (which gnome-shell uses). However, this is not Fedora's fault or really anyone's fault. The nouveau driver is a reverse engineered solution that is continually improving despite having very few actual developers working on it. We've recognized this gap and there are more people assigned to nouveau now than ever before. Progress will be slow, but it is being made. As Justin mentions, support for newer nvidia cards should continue to improve in the 4.6 and 4.7 kernels now that some of the signed firmware issues are worked out.

(paraphrased) Fedora expects everyone to use Intel GPUs

We have no such expectations. Intel does have more market penetration due to the on-board GPUs it ships with its newer CPUs, and they have a development team working on their open source driver directly upstream. The i915 driver is held up as the ideal standard for open source GPU drivers, but it is not bug free by any means. The radeon driver is similarly open source and typically works well but it also is not bug free. GPU work is hard. At least those vendors are doing it in the open, and they should be applauded for it. That does not mean usage of their product is required.

(Personally, I use both Intel and ATI GPUs in my two primary machines.)

(paraphrased) Fedora sabotages its kernels to make it incompatible with third party drivers.

This is blatantly false. We do not add any patches to intentionally break anything. That would be untrue to Fedora's Foundations, limiting our users for no reason at all, and I would personally find it immoral.

The only limitation Fedora places on third party modules is under the Secure Boot case in order to fully support that mechanism. They must be signed with a cert that is imported in the kernel, and we have provided documentation on how to do this or disable Secure Boot for those wishing to not bother with it.

So now what

There is no good answer here. While I dislike the more inflammatory claims, personal attacks, and overall tone in the bug, I stand by my assertion that the reporter has every right to be angry. They simply want to use the hardware they have purchased. I think the anger is misdirected somewhat, but hopefully this post illustrates that there are two sides to every story and elaborates on why it isn't as simple as many make it out to be.

Fedora knows people want to use things that give them the best performance or user experience. We aren't actively trying to prevent either of those. We try and balance the needs of as many users as we can. In this specific case, we're also looking at ways to improve the user experience without compromising Fedora's stance on proprietary software. Unfortunately, there are situations where things will break and we simply cannot fix them. Please continue to tell us about them so we're aware. Perhaps with more understanding and less vitriol in the future.

This post reflects the author's opinion and is not necessarily representative of the Fedora project or the author's employer.