|Fedora kernel exploded tree part deux: Snakes and assumptions
||[Oct. 6th, 2015|01:13 pm]
A while back I wrote about some efforts to move to using an exploded source tree for the Fedora kernel. As that post details, it wasn't the greatest experience. However, I still think an exploded tree has good utility and I didn't want to give up on the idea of it existing. So after scraping our "switch" I decided to (slowly) work on a tool that would create such a tree automatically. In the spirit of release-early and pray nobody dies from reading your terrible code, we now have fedkernel. The readme file in the repo contains a high level overview of how the tool works, so I won't duplicate that here. Instead I thought I would talk about some of the process changes and decisions we made to make this possible.
Git all the things
One of the positive fallouts of the previous efforts was that all of the patches we carried in Fedora were nicely formatted with changelogs and authorship information. Being literally the output of git-format-patch instantly improved the patch quality. When it came time to figure out how to generate the patches from pkg-git to apply to the exploded tree, I really wanted to keep that quality. So I thought about how to accomplish this and then I realized there was no need to reinvent the wheel. The git-am and git-format-patch tools existed and were exactly what I wanted.
After discussing things with the rest of the team, we switched to using git-am to apply patches in the Fedora kernel spec. The mechanics of this are pretty simple: the spec unpacks the tarball (plus any -rcX patches) and uses this as the "base" commit. Stable update patches are applied as a separate commit on top of the base if it is a stable kernel. Then it walks through every patch and applies it with git-am. This essentially enforces our patch format guidelines for us. It does have the somewhat negative side effect of slowing down the %prep section quite a bit, but in practice it hasn't been slow enough to be a pain. (Doing a git add and git commit on the full kernel sources isn't exactly speedy, even on an SSD.)
So after %prep is done, the user is left with a git tree in the working directory that has all the Fedora patches as separate commits. "But wait, isn't the job done then?", you might ask. Well, no. We could call it good enough, but that isn't really what I or other users of an exploded tree were after. I wanted a tree with the full upstream commit history plus our patches. What this produces is just a blob, plus our patches. Not quite there yet but getting closer.
This is where fedkernel comes in. I needed tooling that could take the patches from this franken-tree and apply them to a real exploded git tree. My previous scripts were written in bash, and I could have done this in bash again but I wanted to make it automated and I wanted it to talk to the Fedora infrastructure. This means it was time to learn python again. Fortunately, the upstream python community has great documentation and there are modules for pretty much anything I needed. This makes my "bash keyboard in interactive python session" approach to the language pretty manageable and I was able to make decent progress.
To get the patches out of the prepped sources, I needed to know mainly one thing. What was the actual upstream base for this build? That is easy enough to figure out if you can parse certain macros in kernel.spec. The one part that proved to be somewhat difficult was for git snapshot kernels. We name these -gitY kernels, where X increases until the next -rcX release. E.g. kernel-4.3.0-0.rc3.git1.1, kernel-4.3.0-0.rc3.git2.1, etc. That's great for RPMs, but the only place we actually documented what upstream commit we generated the snapshot from was in an RPM %changelog comment.
Parsing it out of there is possible, but it's a bit cumbersome and it is somewhat error prone. The sha1sum is always recorded, but it isn't guaranteed to be the newest changelog. Other patches and changelogs can be added before the kernel is actually built. Fortunately, we use a script to generate these snapshots. To make it trivial to figure out the sha1sum, I modified the script to record the full commit has to a file called gitrev in pkg-git. Now fedkernel can easily read that file and use it as the base upstream revision to apply patches on top of. Yay for cheating and/or being lazy.
The rest of the code deals with prepping the tree, using the git python module to do manipulations and generate patches, and applying them to the other tree. Python actually made much of this very easy to do and again I'm really glad I used that instead of bash.
So now that we modified a few things in pkg-git to make this easier, the assumptions basically fall out to be:
- Patches in the prepped source can be retrieved via 'git format-patch'
- The upstream base revision is determinable from kernel.spec and the gitrev file.
Pretty simple, right? Yes. Except the code isn't complete by any means and it requires a bit more manual setup. Like having existing pkg-git and linux.git trees that it can modify which contain all the branches and proper remotes set up already. That isn't really a huge issue, but it does mean when f24 is branched and rawhide becomes f25, we'll need to do some updates. Perhaps before then we'll have fixed the code (or some nice person will submit a pull request that does so.)
I've been using the tool to generate exploded trees for the past week or so. It seems to be working well, and I've published them at https://git.kernel.org/cgit/linux/kernel/git/jwboyer/fedora.git/ once again. There is a history gap there as the tree fell into disrepair for a while, but it should be kept current going forward.
Even if the code is terrible and hacky, writing it was a good learning experience. I hope to keep refining it and improving things in the TODO over time. If you want to pitch in, patches are always welcome. You can email them to me, submit a pagure.io pull request, or mail the Fedora kernel list as usual.