Log in

No account? Create an account
koji bisecting kernel builds - pointless pontifications of a back seat driver [entries|archive|friends|userinfo]

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

koji bisecting kernel builds [May. 10th, 2012|03:13 pm]

I've discussed a number of times the fact that we rebase the kernel in Fedora quite often. I'm not going to rehash that whole discussion, but it does have some side effects. One of the worst is that it inevitably introduces regressions. Those can range from "my doohickey no longer beeps when I cat some random string to it from a shell script I wrote in 1994" to "my machine doesn't boot". These are obviously of the same importance.

Now, if you as a user were constantly running rawhide you would notice these regressions immediately after your daily kernel update of the latest git snapshot and reboot. Except you aren't running rawhide, are you? No, you aren't. Instead, you seem to think you have better things to do than be guinea pigs for the upstream kernel developers and you're probably running the latest stable Fedora release. If we're lucky, you're running the Beta of the one under development. That means you don't see these regressions until we do a major rebase, and by then there are hundreds of commits and dozens of kernel builds between the kernel you were using before and the one you just hit a problem on.

When this happens, we often suggest finding the first build in koji that exhibits the issue. Those of you familiar with git (or, you know, math) would recognize this as doing a bisect. Normally we'd look at the two reported versions that represent good and bad kernels, and we'd go digging through the koji website and post a bunch of links to various kernels. Then the user would click, download, install, reboot, test, repeat until they found the first one that failed in their particular way they are interested in. After that, the kernel team would look through the range and see if we can narrow down which commit might have caused this.

That whole process can be kind of tedious, and usually relied on the kernel developers taking the time to find the links, etc. So instead of doing it by hand each time and having almost no log of what the user actually did, I decided I'd try and write a tool to help us do that. After hacking around and re-learning (or maybe learning in the first place) python, I've come up with a simple tool to do just that. It is aptly, and unimaginatively, named 'koji-bisect', is released under the GPLv2 license, and lives here:


A few caveats. 1) In the spirit of 'release/fail-early-often' I'm posting about it now so interested parties can laugh at my python coding skillz and give it a try. 2) It doesn't install the kernels for you, but it will download them so you don't have to. 3) Did I mention my python is bad? It is. 70% of the time it took me to write this was staring at docs.python.org. Alas, I guess that's what I can expect having my head firmly in C for so long.

A small example of it's usage. Let's say you are semi-diligently updating your kernel in F17 and you hit a problem with kernel-3.3.5-2.fc17. The last working kernel you had was kernel-3.3.0-8.fc17. So you essentially have the entirely stable series of releases between the good and bad kernels. You grab the tool from git and do:

[jwboyer@zod koji-bisect]$ ./koji-bisect.py --start
successfully connected to hub
Must have a good or bad index
[jwboyer@zod koji-bisect]$ ./koji-bisect.py --good kernel-3.3.0-8.fc17
Marking kernel-3.3.0-8.fc17 as good
[jwboyer@zod koji-bisect]$ ./koji-bisect.py --bad kernel-3.3.5-2.fc17Marking kernel-3.3.5-2.fc17 as bad
successfully connected to hub
downloads/kernel-3.3.2-5.fc17/kernel-tools-devel-3.3.2-5 |  97 kB     00:00 ... 
downloads/kernel-3.3.2-5.fc17/kernel-debug-devel-3.3.2-5 |  15 MB     00:16 ... 
downloads/kernel-3.3.2-5.fc17/kernel-headers-3.3.2-5.fc1 | 1.6 MB     00:00 ... 
downloads/kernel-3.3.2-5.fc17/kernel-debug-3.3.2-5.fc17. |  53 MB     00:31 ... 
downloads/kernel-3.3.2-5.fc17/python-perf-3.3.2-5.fc17.x | 136 kB     00:00 ... 
downloads/kernel-3.3.2-5.fc17/kernel-debug-modules-extra | 2.7 MB     00:05 ... 
downloads/kernel-3.3.2-5.fc17/kernel-modules-extra-3.3.2 | 2.6 MB     00:02 ... 
downloads/kernel-3.3.2-5.fc17/kernel-tools-3.3.2-5.fc17. | 217 kB     00:00 ... 
downloads/kernel-3.3.2-5.fc17/perf-3.3.2-5.fc17.x86_64.r | 882 kB     00:00 ... 
downloads/kernel-3.3.2-5.fc17/kernel-3.3.2-5.fc17.x86_64 |  51 MB     00:41 ... 
downloads/kernel-3.3.2-5.fc17/kernel-devel-3.3.2-5.fc17. |  15 MB     00:05 ... 
kernel-3.3.2-5.fc17 is now available for install.
[jwboyer@zod koji-bisect]$ 

Now in the downloads/ directory you have a shiny kernel to install and test. So you do that, but it still didn't work. You can either just call the script with --bad on that new test version, or you can take a look at what's left first (or both!):

[jwboyer@zod koji-bisect]$ ./koji-bisect.py --list
kernel-3.3.0-8.fc17         good build
kernel-3.3.2-5.fc17         current build
kernel-3.3.5-2.fc17         bad build
[jwboyer@zod koji-bisect]$ 

You'll notice that there are f16 builds listed as well. That is actually on purpose, as we grab ALL the builds and sort them via RPM Epoch-Name-Version-Release ordering. In our contrived example, we don't really need the f16 builds, but often times we have to install builds from different releases in order to continue bisecting. E.g. if you hit a bug in a kernel-3.3.0 rebase on F16 you'd need to use the -rcX builds from F17. Anyway, if you don't want to see them you can use the --dist option to limit what's used based on disttag.

So we mark the test one bad (or good), and rinse and repeat until eventually the script tells you:

[jwboyer@zod koji-bisect]$ ./koji-bisect.py --list
kernel-3.3.0-8.fc17         good build
kernel-3.3.1-1.fc17         current build
kernel-3.3.1-3.fc17         bad build
[jwboyer@zod koji-bisect]$ ./koji-bisect.py --bad kernel-3.3.1-1.fc17
Marking kernel-3.3.1-1.fc17 as bad
Build kernel-3.3.1-1.fc17 is the first bad build
[jwboyer@zod koji-bisect]$ 

You then dutifully tell the Fedora kernel team that kernel-3.3.1-1.fc17 caused your regression and we go off scratching our heads.

There is a lot more that can be done with this. Perhaps we can get it so that once it find the first bad kernel, it does a vanilla build of that to see if it's a Fedora patch that broke something. And if that vanilla build fails too, maybe even dip into a git bisect. However, all of that is stuff that can be added over time.

So if you're interested, grab it from git and give it a whirl. If you want to submit patches, send them to me! If you want to ridicule my python skills, or how simplistic the approach is so far, do that too! I'll either use it as constructive criticism to improve things, or as fuel for my hate-powered motivational engine. I win either way.

Now, excuse me while I go off and try to figure out a bug I found while trying to do the example demo.

[User Picture]From: cdamian
2012-05-10 08:51 pm (UTC)
That is pretty cool.

Any reason why it should be limited to the kernel package? It looks like it would be useful to debug all kind of regressions.
(Reply) (Thread)
From: jwboyer
2012-05-10 09:03 pm (UTC)
No reason, other than I spend enough time in koji to know that the kernel package is packageID=8 and I work on the kernel mostly.

I'll add it to the TODO file
(Reply) (Parent) (Thread)