- There’s no active components to maintain.
- It’s super cheap. Like, really.
/etc/apt/sources.list:
deb http://deb.debian.org/debian/ bookworm main non-free-firmware
Let’s anatomize this line and see what each part means:
deb: nothing special here, this just indicates that this line specifies a remote repository.http: the protocol to use to communicate with the repository. By the way, HTTPS is not necessary (although supported nowadays), as packages are later verified with GPG signatures. See more below.://deb.debian.org/debian/: URL where the repository lives.bookworm: this is the distribution. In the Debian world, each version’s name is called the “distribution”.main,non-free-firmware: those are components. In the Debian world, this is how packages are grouped. While we don’t particularly care for it, it allows Debian to separate the non-free packages, as well as probably other legacy reasons. (Feel free to enlighten us if you know more!)
- It’s an
httprepository and this dumb Debian system only knowscurl. - We’re taking the easy path. Debian systems provide many features, we only care about the most straightforward ones.
$URL,$distribution,$componentare the 3 parameters mentioned above.$architectureis the CPU architecture, such as amd64.
- It does a
curl $URL/dists/$distribution/InReleaseandcurl $URL/dists/$distribution/Release.gpg, where:- The
InReleasefile has the list ofPackagesfiles (among others) and their hash sums. - The
Release.gpgfile has a signature of theInReleasefile.
- The
- It then does a
curl $URL/dists/$distribution/$component/binary-$architecture/Packages.gz, which has the list of packages.
Packages file looks like this:
- With the hash sums, you can actually verify that the package you end up downloading is valid thanks to the top-level signature that was previously downloaded. Transitive trust matters!
- There is a
Filenameproperty: it points to the actual.debfile that you can download and rundpkg -ion.
I think you can see where I’m going with that. And yes, this is all static. It is all easy to just replicate that hierarchy to AWS S3, and poof, it works. That’s what our CI jobs do: they replicate this folder hierarchy on local disk:
aws s3 sync . s3://... (essentially an “rsync”) and magic! We’re done! 🪄
… almost.
Well, yeah, these are internal Debian repositories. We don’t really want them to be open to the world. How do we do
authentication, then? We cannot just have public S3 buckets, can we?
On a related note, you might have found it a bit weird how http was separated when it was explained what the
sources.list line was made up of. This was intentional: Debian systems support more than one protocol for the remote.
Of course, http and https are installed by default.
But you know what’s even better? You can add support for new protocols.
By default, all our Debian systems install apt-transport-s3. This package allows us to have this sources.list:
deb s3://internal-debian-repository.platform.sh bookworm main
Custom protocol, with our S3 bucket URL, and the package allows for authentication details to be provided as part of the
URL. (As in, s3://<access key ID>:<secret key>@<bucket name>.)
And that’s it! With these simple tricks, or really, knowledge of how Debian repositories work, it was straightforward to host our own repositories on S3, allowing us to be very flexible with how we organize our repositories. As an example, we define a new Debian component per tag of a given repository, allowing us to upgrade our systems very deterministically. I hope that was helpful and that you learned something!