CrowdSec logo

Packaging CrowdSec for Debian: Bullseye (Debian 11)

Introduction

Let’s start by quoting the CrowdSec website:

CrowdSec is an open-source and collaborative security stack leveraging the crowd power. Analyze behaviors, respond to attacks & share signals across the community. Join the community and let’s make the Internet safer, together.

Put in another way, we’re basically talking about a global fail2ban solution rather than a local one.

Ready. Steady. Go!

Debamax started working on the packaging very late in the Bullseye (Debian 11) release cycle, i.e. in December 2020, while the soft freeze was scheduled for early February 2021. This meant a very short window to get many new packages into unstable and then into testing, since new packages can no longer be introduced into testing once the soft freeze starts. Therefore it was decided to concentrate immediate efforts on the Security Engine itself, to be packaged as crowdsec, and to delay work on Remediation Components (bouncers) until the beginning of the Bookworm (Debian 12) release cycle.

Thankfully, the Debian Go Packaging Team was very welcoming, offered some guidance when needed, and also promptly validated the various plans that were proposed when it came to updating existing packages. The team’s documentation is rather extensive and made it easy to get started with Go packages.

Build dependencies

The dh-make-golang command really helps bootstraping new source packages. The remaining hard parts are mainly about checking the copyright and license information (debian/copyright), and keeping track of dependencies that are spotted by inspecting the upstream go.mod files, and that might not be in the archive yet, to build the whole dependency tree.

This is how the first part of the plan, Packaging crowdsec and some other packages, ended up being mostly about introducing many new golang-* packages into the archive, to be used as Build-Depends for the upcoming crowdsec package.

Introducing new packages seemed relatively easy, with the main question being when they would be processed from NEW, i.e. reviewed by the FTP team. Since most Go packages are very similar, and since the MIT/X license is used all over the place, it seems they are reviewed very quickly, and packages were accepted in record time.

The second part of the plan was a little more complicated, as it was about updating an existing package, which counts many reverse dependencies, and that update also involved adding new packages. Let’s take a step back and sum up the relevant characteristics of the Go ecosystem.

Library packages in Go are very different from the historical shared libraries and their development packages: Go library packages ship the source code, period. End-user programs like crowdsec list library packages in their Build-Depends and the build system aggregates their source code inside the build directory. This means the usual API/ABI stability concepts don’t apply, and this is compensated by having some tooling to determine whether an update is considered safe.

This is where ratt (which stands for “rebuild all the things!”) comes into play: once an updated package has been prepared, using metadata found in Sources files, ratt determines the reverse dependencies and builds them all in turn, using the updated package. It leverages sbuild, enriching the build environment with the updated package automatically. Since most Go packages come with a test suite, just building them gives a pretty good indication as to whether the update should be regarded as safe.

If failures (FTBFS) come up during such a systematic rebuild of all reverse dependencies, it makes sense to see if those failures are already happening with the package already in the archive:

  • If that’s the case, check whether the Debian BTS already knows about that failure, and file a new bug report if needed.
  • If that’s a new regression, further planning must happen. That might mean trying to package a new version, backporting a compatibility fix, etc.

Two external factors come in:

  • There are regular rebuilds of all packages in the archive, so an existing failure to build would usually be documented in the BTS already.
  • The severity of such bug reports is serious, which is considered RC and is ground for removal from testing.

This means that packages only in unstable are usually not considered a blocker when preparing an updated version of an existing package.

Going back to the issue at hand, the Updating golang-github-gin-gonic-gin to 1.6.3 plan listed 3 failing packages out of the 35 detected reverse dependencies. Thankfully, one of them was a false positive (due to ratt’s lack of support for multiple source versions), and the other two were known issues. A green light was given to update this package.

Finally, the third part was about updating two existing packages that provide GeoIP support: Updating golang-github-oschwald-maxminddb-golang and golang-github-oschwald-geoip2-golang. The maintainer gave a green light for both uploads, which were relatively low risk anyway since it only involved three packages: the two getting updated, and a single reverse dependency.

Plugging eveything together, here is a complete graph of all relevant packages: 21 new packages (green) and 3 updated packages (red), in addition to crowdsec itself (magenta).

New and updated packages in Bullseye

Patching upstream

Having a bunch of build dependencies is only a part of the packaging work. Another one is tailoring the software to the distribution’s needs or best practices, which is usually implemented via a series of patches. Here is a quick summary of the patches found in the Bullseye version:

  • 0001: Implements the machineid feature, which spares a dependency on yet another new package.
  • 0002: Ensures compatibility with the version of the SQLite driver found in Bullseye.
  • 0003: Adjusts the crowdsec.service systemd unit to match Debian’s needs and best practices.
  • 0004: Drops support for geoip-enrich which relies on non-free GeoIP data.
  • 0005: Tweaks paths in the config file to point at /var/lib/crowdsec instead of /etc/crowdsec for the hub.
  • 0006: Tweaks information messages to mention systemctl restart crowdsec instead of just reload, which was buggy in the 1.0.x versions.
  • 0007: Enables the online hub transparently when cscli hub update is called.
  • 0008: Removes a broken scenario (backported from upstream).
  • 0009: Avoids false positives by using regular expressions with word boundaries (backported from upstream).

Packaging separate resources

Two different things live outside the main CrowdSec repository and need special consideration.

Data

The data part is mostly static, and scattered across other repositories. Let’s look at the files shipped alongside crowdsec as of version 1.0.9, which should give an idea what they are all about (the filenames are quite self-explanatory):

  • backdoors.txt
  • bad_user_agents.txt
  • cloudflare_ips.txt
  • http_path_traversal.txt
  • ip_seo_bots.txt
  • rdns_seo_bots.regex
  • rdns_seo_bots.txt
  • sensitive_data.txt
  • sqli_probe_patterns.txt
  • xss_probe_patterns.txt

Hub

The hub part is more challenging: crowdsec itself is a Security Engine, that needs a number of collections to define what it should do. Those collections can point to parsers, scenarios, postoverflows, and to other collections as well.

The hub is accessible online but it felt a little weird to force all users to start by having to download things right after installing the package. It was agreed with upstream to ship a copy of the hub data alongside the main crowdsec source tarball, and to install those files directly in the crowdsec binary package. Those files would be used by default upon installation, but users wishing to switch to the online hub can do so by using the cscli hub update command.

Implementation

To deal with both the extra data and hub parts, the crowdsec source package — which uses the 3.0 (quilt) source format — leverages what the dpkg-source manpage calls additional original tarballs. Let’s see what the file list of the crowdsec source package looks like:

  • crowdsec_1.0.9-2.dsc: entry point, containing metadata and listing other files.
  • crowdsec_1.0.9-2.debian.tar.xz: Debian packaging (debian/ directory).
  • crowdsec_1.0.9.orig.tar.gz: original tarball, “main” one (from https://github.com/crowdsecurity/crowdsec).
  • crowdsec_1.0.9.orig-data1.tar.gz: additional original tarball, for data files.
  • crowdsec_1.0.9.orig-hub1.tar.gz: additional original tarball, for the hub.

When deploying the source package, the first additional original tarball is automatically extracted under the data1 directory, below the top-level crowdsec-1.0.9 directory, while the second one is extracted under the hub1 directory.

The 1 suffix is there just in case we needed or wanted to update either additional original tarball while keeping the same crowdsec upstream version (i.e. 1.0.9 in this case).

Another well-known example for such additional original tarballs are language packs. In Bullseye, firefox-esr comes with 97 of them!

Plugging everything together: maintainer scripts

Let’s start by mentioning two core components of the CrowdSec ecosystem:

Since CrowdSec is really about leveraging efforts from the community, it seemed best to have the machine automatically register itself against the Central API upon installation, by default. Since there are many ways to set up CrowdSec, advanced users can disable automatic registration by creating a file in advance, as documented in the README.Debian. This registration happens via cscli api register.

Regarding the Local API, it semeed best to stick to the loopback by default, letting admins adjust the network configuration if desired. The machine registers itself on the Local API via cscli machines add.

The final topic is not the easiest: collections!

Since replacing the offline hub with the online one should be possible, files are deployed under /var/lib/crowdsec (making sure they aren’t treated as conffiles, which would otherwise lead to many prompts during upgrades). Unfortunately, at the moment, crowdsec doesn’t come with a way to manage local collections. For the 1.0.x versions packaged in Bullseye, the hub was small enough that enabling everything seemed reasonable and easy, and that’s what got implemented: for each item found in the offline hub, create a symlink under /etc/crowdsec/, pointing at it.

Everything mentioned in this section is implemented in the crowdsec.postinst maintainer script which is called by dpkg after installation, in particular with the configure action.

Conclusion

Debian 11 shipped with the Security Engine packaged as crowdsec, which made it possible to join the crowd and report suspicious behaviours. Unfortunately, there wasn’t enough time to also package Remediation Components (bouncers), which are responsible for actually taking action.

Package Version
crowdsec 1.0.9-2+b4

Version table for Debian 11 “Bullseye”

See what happened after that, in the next blog post: Packaging CrowdSec for Debian: Bookworm (Debian 12).