Quick introduction about Tails and build reproducibility
Tails is a live operating system, booting from USB or from DVD, aiming at preserving user’s privacy and anonymity. It is Free Software and based on the Debian GNU/Linux distribution. The Tails website contains a more complete overview of the project.
Over the last few years, an increasing number of software developers and security-focused people have been looking into build reproducibility: when a build system is deterministic, building a given component from a given source should always lead to the exact same binary result (byte-for-byte). The Tor project and the Debian distribution were among the first teams to work towards this goal. More information can be found on the reproducible-builds.org website, whose motto is Provide a verifiable path from source code to binary
.
What does it mean in the Tails context? The main “product” of the Tails project is a bootable ISO image which contains the live system containing tools designed and preconfigured to preserve privacy and anonymity. Compromising this image would defeat the whole point of the project and could even endanger lives of journalists or whistleblowers relying on it. Making the image build process reproducible means developers and even users can reproduce it on their own hardware and make sure the ISO image published by the Tails project matches the one which was built locally, or which was verified by others.
How Debamax became involved
Flashback: October 2015.
Cyril had already been working on Debian derivatives for various customers and had been identified by some Tails developers as a potential asset to work on the first steps towards reproducibility. A sprint approach was chosen to tackle the freezable APT repository
topic: meet, discuss, design, code; repeat a few times.
What follows is an overview of the results, with a few pointers to code and documentation. They are presented sequentially but all those topics are closely intertwined, and that had to be taken into account during the design phase.
Keeping track of packages in archives
The first objective was to imagine a workflow which would make it possible to build a given ISO image with the exact same set of Debian packages. An interesting data point is that 4 separate archives are used during a Tails build:
- the regular Debian archive;
- the Debian Security archive;
- the Tor project archive;
- the Tails archive.
Of course, those archives aren’t static: the Debian archive is updated up to 4 times a day, the Debian Security archive is updated whenever a new security update is published, etc. So we needed a way to keep track of all packages used during the build but also of the state of each archive at any point where an image was being built.
It was decided to use reprepro, which is designed to produce custom Debian repositories, while also making it possible to mirror upstream repositories. It also allows to create snapshots, which exactly fits the need to keep packages around! Packages which would normally be deleted or replaced by a new version (when a synchronization happens) are kept as long as there’s at least one snapshot that depends on them.
First results:
- commits in the puppet-tails.git repository, which is used to maintain the Tails infrastructure; in particular: the files/reprepro/snapshots/time_based and manifests/reprepro/snapshots directories;
- time-based.snapshots.deb.tails.boum.org is where the snapshots are published, to be used during the build process instead of the upstream archives.
Keeping track of packages used during the build
While working for other customers, Cyril already had to keep track of packages used to build Debian images: the idea was to list all packages and versions used for a given build, making it possible to generate changelog-like summaries of changes between two builds.
A similar approach was used here, where triplets are gathered with: package, version, URI. Here’s what the implementation looks like:
- an
apt-get
wrapper was developed to track all downloaded packages (binary and source); - a patch was added to the
debootstrap
script so that it would install that wrapper automatically; - and
debootstrap
itself was patched to store the triplets during thebootstrap
phase as well.
That means three files are generated with those triplets: one with binary packages from the bootstrap phase, one with binary packages downloaded through apt-get
, and one with source packages downloaded through apt-get
as well.
Another script was developed to aggregate those results into what we call a build-manifest
; it gathers all origins
(the archives mentioned in the previous section), their references
(the snapshot used during the build), and all packages along with their versions. Example for the 3.2 release: tails-amd64-3.2.build-manifest.
Keeping track of packages in the long term
At this point we have the following results:
time-based
snapshots of entire archives: for amd64 and i386 at first, and for amd64 only starting with Tails 3.0;- build-manifest files containing references to those snapshots and lists of packages.
Keeping all packages forever wouldn’t be reasonable, so snapshots are expired after a few days. Since storing packages actually used for releases is the whole point of mastering repositories in the first place, an extra tool on the infrastructure side was developed to generate tagged
snapshots from the time-based
ones, thanks to references and packages listed in the build-manifest for the release.
This leads to these results:
- tagged.snapshots.deb.tails.boum.org is where the tagged snapshots are published. Those are used when building or rebuilding a stable release.
- These snapshots only include packages needed to rebuild releases, and of course their sources for license compliance reasons. Since they’re rather small compared to the full time-based snapshots, those can be kept forever.
Putting all the pieces together
Fastforward: November 2017.
Large parts of this initial freezable APT repository
sprint were spent designing what the new workflow would look like during development phases, and during freeze periods. Of course, adjustments were made during the following releases, and the current status is documented on the APT repository page. Details can be found there about the custom APT repository (for Tails), about the time-based snasphots, and about the tagged snapshots.
This was only preliminary work, as there are many reasons which can trigger differences in the resulting ISO image. Details can be found in the reproducible builds blueprint. Many issues have been tackled by the Tails developers since then, and that’s how the 3.3 release has been announced as the first reproducible ISO image! (Of course, this is still rather new, and bug #14933 has been filed already, but the current results are amazing already!)
Congratulations to the Tails developers for reaching this milestone, and many thanks for this cooperation opportunity!