Tails: early work on reproducibility

Tails logo

Quick introduction about Tails and build reproducibility

Tails is a live operating system, booting from USB or from DVD, aiming at preserving user’s privacy and anonymity. It is Free Software and based on the Debian GNU/Linux distribution. The Tails website contains a more complete overview of the project.

Over the last few years, an increasing number of software developers and security-focused people have been looking into build reproducibility: when a build system is deterministic, building a given component from a given source should always lead to the exact same binary result (byte-for-byte). The Tor project and the Debian distribution were among the first teams to work towards this goal. More information can be found on the website, whose motto is Provide a verifiable path from source code to binary.

What does it mean in the Tails context? The main “product” of the Tails project is a bootable ISO image which contains the live system containing tools designed and preconfigured to preserve privacy and anonymity. Compromising this image would defeat the whole point of the project and could even endanger lives of journalists or whistleblowers relying on it. Making the image build process reproducible means developers and even users can reproduce it on their own hardware and make sure the ISO image published by the Tails project matches the one which was built locally, or which was verified by others.

How Debamax became involved

Flashback: October 2015.

Cyril had already been working on Debian derivatives for various customers and had been identified by some Tails developers as a potential asset to work on the first steps towards reproducibility. A sprint approach was chosen to tackle the freezable APT repository topic: meet, discuss, design, code; repeat a few times.

What follows is an overview of the results, with a few pointers to code and documentation. They are presented sequentially but all those topics are closely intertwined, and that had to be taken into account during the design phase.

Keeping track of packages in archives

The first objective was to imagine a workflow which would make it possible to build a given ISO image with the exact same set of Debian packages. An interesting data point is that 4 separate archives are used during a Tails build:

Of course, those archives aren’t static: the Debian archive is updated up to 4 times a day, the Debian Security archive is updated whenever a new security update is published, etc. So we needed a way to keep track of all packages used during the build but also of the state of each archive at any point where an image was being built.

It was decided to use reprepro, which is designed to produce custom Debian repositories, while also making it possible to mirror upstream repositories. It also allows to create snapshots, which exactly fits the need to keep packages around! Packages which would normally be deleted or replaced by a new version (when a synchronization happens) are kept as long as there’s at least one snapshot that depends on them.

First results:

Keeping track of packages used during the build

While working for other customers, Cyril already had to keep track of packages used to build Debian images: the idea was to list all packages and versions used for a given build, making it possible to generate changelog-like summaries of changes between two builds.

A similar approach was used here, where triplets are gathered with: package, version, URI. Here’s what the implementation looks like:

That means three files are generated with those triplets: one with binary packages from the bootstrap phase, one with binary packages downloaded through apt-get, and one with source packages downloaded through apt-get as well.

Another script was developed to aggregate those results into what we call a build-manifest; it gathers all origins (the archives mentioned in the previous section), their references (the snapshot used during the build), and all packages along with their versions. Example for the 3.2 release:

Keeping track of packages in the long term

At this point we have the following results:

Keeping all packages forever wouldn’t be reasonable, so snapshots are expired after a few days. Since storing packages actually used for releases is the whole point of mastering repositories in the first place, an extra tool on the infrastructure side was developed to generate tagged snapshots from the time-based ones, thanks to references and packages listed in the build-manifest for the release.

This leads to these results:

Putting all the pieces together

Fastforward: November 2017.

Large parts of this initial freezable APT repository sprint were spent designing what the new workflow would look like during development phases, and during freeze periods. Of course, adjustments were made during the following releases, and the current status is documented on the APT repository page. Details can be found there about the custom APT repository (for Tails), about the time-based snasphots, and about the tagged snapshots.

This was only preliminary work, as there are many reasons which can trigger differences in the resulting ISO image. Details can be found in the reproducible builds blueprint. Many issues have been tackled by the Tails developers since then, and that’s how the 3.3 release has been announced as the first reproducible ISO image! (Of course, this is still rather new, and bug #14933 has been filed already, but the current results are amazing already!)

Congratulations to the Tails developers for reaching this milestone, and many thanks for this cooperation opportunity!

Published: Fri, 08 Dec 2017 10:15:00 +0100
Previous article: Debian Installer: Stretch released
A look back on the last release candidates of the Debian Installer before the final Stretch release.