As mentioned by , I have been spending some time on improving Britney2. Just the other day I submitted a second branch for review that I expect to merge early next week. I also got another set patches coming up soon. Currently, none of them are really user visible, so unless you are hosting your own version of Britney, these patches are probably not all that interesting to you.
The highlights:
- Reduce the need for backtracking by finding semantically equivalent packages.
- Avoid needing to set up a backtrack point in some cases.
- This has the side-effect of eliminating some O(e^n) runtime cases.
- Optimise “installability” testing of packages affected by a hinted migration.
- This has the side-effect of avoiding some O(e^n) runtime cases when the “post-hint” state does not trigger said behaviour.
- There is a follow-up patch for this one coming in the third series to fix a possible bug for a corner-case (causing a valid hint to be incorrectly rejected when it removed an “uninstallable” package).
- Reduce the number of affected packages to test when migrating items by using knowledge about semantically equivalent packages.
- In some cases, Britney can now do “free” migrations when all binaries being updated replace semantically equivalent packages.
- (Merge pending) Avoid many redundant calls to “sort_actions()”, which exhibits at least O(n^2) runtime in some cases.
- For the dataset Raphaël submitted, this patch shaves off over 30 minutes runtime. In the particular case, each call to sort_actions takes 3+ minutes and it was called at least 10 times, where it was not needed.
- That said, sort_actions have a vastly lower runtime in the runs for Debian (and presumably also Ubuntu, since no one complained from their side so far).
The results so far:
After the first patch series was merged, the Kali dataset (from Raphaël) could be processed in “only” ~2 hours. With the second patch series merged, the dataset will drop by another 30-50 minutes (most of which are thanks to the change mentioned in highlight #5).
The third patch series currently do not have any mention-worthy performance related changes. It will probably be limited to bug fixes and some refactoring.
Reflections:
The 3 first highlights only affects the “new” installability tester meaning that the Britney2 instances at Ubuntu and Tanglu should be mostly unaffected by the O(n^2) runtime. Although those cases will probably just fail with several “AIEEE“s. :) The 5th highlight should equally interesting to all Britney2 instances though.
For me, the most interesting part is that we have never observed the O(n^2) behaviour in a daily “sid -> testing” run. The dataset from Raphaël was basically a “stable -> testing/sid” run, which is a case I do not think we have ever done before. Despite our current updates, there is still room for improvements on that particular use case.
In particular, I was a bit disheartened at how poorly our auto hinter(s) performed on this dataset. Combined they only assisted with the migration of something like 28 “items”. For comparison, the “main run” migrated ~7100 “items” and 9220 items were unable to migrate. Furthermore, the “Original” auto hinter spend the better part of 15 minutes computing hints – at least it results in 10 “items” migrating.
Links to the patches:
- First series (already merged)
- Second series (pending, to be merged Monday, the 11th of August)
- Third series (under development. Warning: the branch will be rebased if/as needed)
