Thanks to a heads up from Bastian Blank, I learned that Lintian 2.5.7 and 2.5.8 were horribly slow on the Linux binaries. Bastian had already identified the issue and 2.5.9 fixed the performance regression.
But in light of that, I decided to have a look at a couple of other bottlenecks. First, I added a simple benchmark support to Lintian 2.5.10 (enabled with -dd) that prints the approximate run time of a given collection. As an example, when running lintian -dd on lintian 2.5.10, you can see something like:
N: Collecting info: unpacked for source:lintian/2.5.10 ... [...] N: Collection script unpacked for source:lintian/2.5.10 done (0.699s)
When done on linux-image, the slowest 3 things with 2.5.10 are (in order of appearance):
[...] N: Collection script strings for binary:linux-image-3.2.0-2-amd64/3.2.20-1/amd64 done (12.333s) N: Collection script objdump-info for binary:linux-image-3.2.0-2-amd64/3.2.20-1/amd64 done (15.915s) [...] N: Finished check: binaries (5.911s) [...]
(The mileage (and order) probably will vary a bit.)
These 3 things makes up about 22 seconds of a total running time on approximately 28-30s on my machine. Now if you wondering how 12, 16 and 6 becomes 22 the answer is “parallelization”. strings and objdump-info are run in parallel so only the “most
expensive” of the two counts in practise (with multiple processing units).
The version of linux-image I have been testing (3.2.20-1, amd64) has over 2800 ELF binaries (kernel modules). That makes the runtime of strings and objdump-info much more dominating than in “your average package”. For the fun of it – I have done a small informal benchmark of various Lintian versions on the binary.
I have used the command line:
# time is the bash shell built-in and not /usr/bin/time $ time lintian -EvIL +pedantic linux-image-3.2.0-2-amd64_3.2.20-1_amd64.deb >/dev/null # This was used with only versions that did not accept -L +pedantic $ time lintian -EvI --pedantic linux-image-3.2.0-2-amd64_3.2.20-1_amd64.deb >/dev/null
With older versions of Lintian (<= 2.5.3) Perl starts to emit warnings; these have been manually filtered out. I used lintian from the git repository (i.e. I didn’t install the packages, but checked out the relevant git tags). I had libperlio-gzip-perl installed (affects the 2.5.10 run).
Most results are only from a single run, though I ran it twice on the first version (hoping my kernel would cache the deb for the next run). The results are:
2.5.10 real 0m28.836s user 0m36.982s sys 0m3.280s 2.5.9 real 1m9.378s user 0m33.702s sys 0m11.177s 2.5.8 real 4m54.492s user 4m0.631s sys 0m30.466s 2.5.7 (not tested, but probably about same as 2.5.8) 2.5.{0..6} real 1m20s - 1m22s user 0m19.0s - 0m20.7s sys 0m5.1s - 0m5.6s
I think Bastian’s complaint was warranted for 2.5.{7,8}.
While it would have been easy to attribute the performance gain in 2.5.10 on the new parallelization improvements, it is simply not the case. These improvements only apply to running collections when checking multiple packages. On my machine, the parallelization limit for a package is effectively determined by the dependencies between the collections on my machine.
Instead the improvements comes from reducing the number of system(3) (or fork+exec) calls Lintian does. Mostly through using xargs more, even if it meant slightly more complex code. But also, libperlio-gzip-perl shaved off a couple of seconds on “binaries” check.
But as I said, linux-image is “not your average package”. Most of the improvements mentioned here are hardly visible on other packages. So let’s have a look at some more other bottlenecks. In my experience the following are the “worst offenders”:
- unpacked (collection)
- Seen on wesnoth-1.9 source. Here the problem seems to be tar+bzip2, so there is not really a lot to do (on the Lintian side). Though feel free to prove me wrong.
- Seen on wesnoth-1.9 source. Here the problem seems to be tar+bzip2, so there is not really a lot to do (on the Lintian side). Though feel free to prove me wrong.
- file-info (collection)
- Seen in eclipse/eclipse-cdt source. file(1) appears to spend a lot of time classifying some source files. For eclipse-cdt, I experience an approx. 10 second speed up (from 40s to 30s) if file are recompiled with -O2. (That would be #659355). However, even if file is compiled with -O2, the file-info collection is still the dominating factor.
- manpages (check)
- Running man on manpages can be a dominating factor in certain doc packages. This is #677874 and suggestions for fixing it are more than welcome.
But enough Lintian for now… time to fix some RC bugs!
