In 2011, I wrote about how small files could consume a lot of space. I meant to do a follow-up on the savings but I forgot about it until now.
In 2.5.7, we started compressing some of the collected data files. Some of these are ridiculously compressable (#664794). Even better, compressing them is sometimes faster than writing them directly to the disk, so in some cases it is a pure win/win. For lintian.d.o, we also see a vast size reduction in overall size of the laboratory.
I have taken a few samples occasionally. The samples were done with du(1):
$ du -csh [--apparent-size] laboratory/*
Version/date | du -csh | –apparent-size |
---|---|---|
N/A – around 20 Mar 2012 (#664794) | 16G | 13G |
2.5.6 (Fri Apr 27 2012) | 14GB | N/A |
2.5.6 (Mon Jun 04 2012)) | N/A | 12G |
2.5.10.2 (Fri Sep 21 2012) | 12G | 8.3G |
2.5.11 (Wed Jan 2 2013) | 10G | 6.1G |
And the most awesome part of this? The comparison is quite biased against the 2.5.11 entry, which is the only entry to also process experimental (approx. 10% extra packages). Some of the early entries (2.5.6 and “older”) might also have suffered from the “too many links” issue[1]. I only wish I had been better at collecting data points, so I could have made a proper graph of it.
It sounds almost too good to be true, but if you look at the size of one of the linux-image packages[1], the space usage dropped from 27M to 15M between 2.5.5 to 2.5.9. Currently it is squeezed down to 14M (tested with head of the git master branch).
[1] I believe is about 5-10% less binary packages processed for those runs.
[2] linux-image-3.2.0-2-amd64_3.2.20-1_amd64.deb
