Since debhelper/10.3, there has been a number of performance related changes. The vast majority primarily improves bulk performance or only have visible effects at larger “input” sizes.
Most visible cases are:
- dh + dh_* now scales a lot better for large number of binary packages. Even more so with parallel builds.
- Most dh_* tools are now a lot faster when creating many directories or installing files.
- dh_prep and dh_clean now bulk removals.
- dh_install can now bulk some installations. For a concrete corner-case, libssl-doc went from approximately 11 seconds to less than a second. This optimization is implicitly disabled with –exclude (among other).
- dh_installman now scales a lot better with many manpages. Even more so with parallel builds.
- dh_installman has restored its performance under fakeroot (regression since 10.2.2)
For debhelper, this mostly involved:
- avoiding fork+exec of commands for things doable natively in perl. Especially, when each fork+exec only process one file or dir.
- bulking as many files/dirs into the call as possible, where fork+exec is still used.
- caching / memorizing slow calls (e.g. in parts of pkgfile inside Dh_Lib)
- adding an internal API for dh to do bulk check for pkgfiles. This is useful for dh when checking if it should optimize out a helper.
- and, of course, doing things in parallel where trivially possible.
How to take advantage of these improvements in tools that use Dh_Lib:
- If you use install_{file,prog,lib,dir}, then it will come out of the box. These functions are available in Debian/stable. On a related note, if you use “doit” to call “install” (or “mkdir”), then please consider migrating to these functions instead.
- If you need to reset owner+mode (chown 0:0 FILE + chmod MODE FILE), consider using reset_perm_and_owner. This is also available in Debian/stable.
- CAVEAT: It is not recursive and YMMV if you do not need the chown call (due to fakeroot).
- If you have a lot of items to be processed by a external tool, consider using xargs(). Since 10.5.1, it is now possible to insert the items anywhere in the command rather than just in the end.
- If you need to remove files, consider using the new rm_files function. It removes files and silently ignores if a file does not exist. It is also available since 10.5.1.
- If you need to create symlinks, please consider using make_symlink (available in Debian/stable) or make_symlink_raw_target (since 10.5.1). The former creates policy compliant symlinks (e.g. fixup absolute symlinks that should have been relative). The latter is closer to a “ln -s” call.
- If you need to rename a file, please consider using rename_path (since 10.5). It behaves mostly like “mv -f” but requires dest to be a (non-existing) file.
- Have a look at whether on_pkgs_in_parallel() / on_items_in_parallel() would be suitable for enabling parallelization in your tool.
- The emphasis for these functions is on making parallelization easy to add with minimal code changes. It pre-distributes the items which can lead to unbalanced workloads, where some processes are idle while a few keeps working.
Credits:
I would like to thank the following for reporting performance issues, regressions or/and providing patches. The list is in no particular order:
- Helmut Grohne
- Kurt Roeckx
- Gianfranco Costamagna
- Iain Lane
- Sven Joachim
- Adrian Bunk
- Michael Stapelberg
Should I have missed your contribution, please do not hesitate to let me know.
Filed under: Debhelper, Debian
