While looking at the dpkg Breaks-field[0]…
$ aptitude show dpkg Package: dpkg [...] Version: 1.16.1.2 [...] Breaks: apt (< 0.7.7), aptitude (< 0.4.7-1), dpkg-dev (< 1.15.8), libdpkg-perl (< 1.15.8), pinfo (< 0.6.9-3.1), tkinfo (< 2.8-3.1)
… it occurred to me that most (all?) of these relations were irrelevant to Britney when she migrated dpkg 1.16.1.2 to testing. Right now, at least the APT relations are only relevant if you are doing something like a distribution upgrade from Lenny/Squeeze to Wheezy. Similarly, the version constraints in a lot of dependency relations (e.g. “libc6 (>= 2.11)”) are satisfied in testing and unstable at the same time.
Removing the version constrains on dependencies is a rather minor thing as it is basically just a minor constant time optimization on each dependency check. However, removing an entire clause slightly reduces the “problem size” a bit. Particularly, the Conflicts/Breaks relations tends to be expensive for us.
The first task was to identify the relations that can (or cannot) be simplified. In a Britney run we at most 4 versions of the same package per architecture, though usually only 1 or 2[1]. I devised a small set of rules to simplify the relations. These rules are applied to each package in the relation (atomic proposition, if you will).
- If the relation is versioned and it involves a virtual package in any suite, then do not change the relation in any way. Rationale: A virtual package cannot satisfy any versioned relation (Policy Manual §7.5)
- If the relation is a dependency (i.e. Pre-Depends or Depends) and the package is not available in any suite, then do not change the relation in any way.
- If the relation is versioned and the relation is satisfied in all suites (where the “relationed-on” package is available), then remove the version constrain. Rationale: If all (present) versions satisfy the relation, then version constrain does not change the semantics.
- If the relation is a conflict (i.e. Conflicts or Breaks) and relation is unsatisfiable in all suites, then remove the relation. Rationale: If none of the (present) versions satisfy the conflict-relation, then there is no conflict[2].
The rules are rather conservative in some cases and there is room for improvements. However, one has to remember that removing too few relations costs a bit in runtime, removing too many breaks testing… and possibly a lot. Obviously, I prefer the former to the latter (especially because I will be a part of the “clean up”-crew).
I tested my implementation of those four rules above against the current master branch. In short, it produces the same result as the master branch in all the tests so far. In the hand-made test-suite, the tests generally do not have any superfluous relations. Thus, it is slightly slower, though usually within 0.1 seconds of the master branch.
On the other hand, in the live-data samples I have collected so far it does vastly better. For the longest run (sample from 2011-12-13), it reduces the runtime with ~70 seconds (from ~215 to ~145 seconds). In the other runs, it reduces total runtime with ~35 and ~2 seconds, respectively. In these samples, only the amd64 and i386 packages are considered (and human hints are ignored).
For those interested, the code is available in my branch.
[0] There is perfectly valid reason for doing that.
I might get back to that in a later post.
[1] One in unstable, testing, testing-proposed-updates and proposed-updates. The latter may seem a bit weird, but… [0]
[2] This is the rule that prune relations like the one in the dpkg Breaks-field.
Edit: 2011-01/09, clarified that we have at most 4 versions per architecture.
