How we host an RPM repository on AWS S3

If you’ve read the post about hosting Debian repositories on S3, you already know the trick: static files on a bucket, no server in sight. Turns out, the same approach works for RPM repositories. But the format is different enough that it deserves its own walkthrough. Our servers run Debian, so this isn’t about our production fleet. But we distribute tools (our CLI, internal utilities) that people on Fedora or RHEL need to install too, and a proper RPM repository beats telling everyone to curl | rpm -i from a random URL. If you’ve ever configured yum or dnf, you’ve probably edited a .repo file. That’s the starting point.

What’s in a `.repo` file

A typical entry in /etc/yum.repos.d/ looks like this:

[internal-packages]
name=Internal Packages
baseurl=https://rpm-repo.platform.sh/el9/
enabled=1
gpgcheck=1
gpgkey=https://rpm-repo.platform.sh/el9/RPM-GPG-KEY-internal

The structure is pretty flat. The baseurl points to the root of the repository, and everything (package lists, metadata, signatures) lives under that path in a predictable layout. No nested concepts like distributions or components to worry about.

How `dnf` fetches packages

When dnf (or yum, same idea) processes a repository, it does roughly this:

It fetches $baseurl/repodata/repomd.xml. This is the entry point. It contains references to all the metadata files and their checksums.
From repomd.xml, it learns the paths to primary.xml.gz, filelists.xml.gz, and other.xml.gz. The important one is primary.xml.gz, which contains the actual list of packages.
If GPG checking is enabled, it verifies repomd.xml against repomd.xml.asc.

The primary.xml.gz file contains entries like this:

<package type="rpm">
  <name>nginx</name>
  <arch>x86_64</arch>
  <version epoch="0" ver="1.24.0" rel="1.el9"/>
  <checksum type="sha256">a1b2c3d4...</checksum>
  <summary>A high performance web server</summary>
  <location href="Packages/nginx-1.24.0-1.el9.x86_64.rpm"/>
  <size package="623104" installed="1843200"/>
</package>

Checksums ensure integrity, transitively trusted through the signed repomd.xml. And the location field points to the actual .rpm file relative to baseurl. A signed index that references checksummed packages.

Generating the metadata

The standard tool for this is createrepo_c. You point it at a directory full of .rpm files:

createrepo_c /path/to/repo/

It produces the entire repodata/ directory with all the XML metadata, checksums, and (if configured) GPG signatures. One command, everything generated. The resulting folder structure:

repodata/
  repomd.xml
  repomd.xml.asc
  primary.xml.gz
  filelists.xml.gz
  other.xml.gz
Packages/
  nginx-1.24.0-1.el9.x86_64.rpm
  redis-7.0.12-1.el9.x86_64.rpm

All static files. Our CI jobs build this structure on local disk, then aws s3 sync . s3://.... Done. (Side note: we ended up building repogen, a CLI tool that handles this metadata generation for RPM, Debian, Pacman, etc repositories all at once. But createrepo_c works perfectly fine on its own if RPM is all you need.) … well, almost.

Authentication

These are internal repositories. Public S3 buckets are not an option. The yum/dnf ecosystem handles this with plugins. Which one you use depends on your package manager:

yum-s3-iam for yum-based systems (RHEL 7 and older). It’s yum-only and doesn’t work with DNF.
dnf-plugin-s3transport is the one for dnf-based systems (RHEL 8+, Fedora). If you’re setting this up today, this is most likely the one you want.

With the right plugin installed, your .repo file becomes:

[internal-packages]
name=Internal Packages
baseurl=s3://internal-rpm-repository/el9/
enabled=1
gpgcheck=1
s3_enabled=1

The plugin handles AWS authentication using IAM roles or instance credentials. No access keys in config files, no credentials to rotate and distribute. Machines authenticate based on what they are (their instance role), not based on credentials someone has to manage.

Versioning with separate repositories

If you want different streams of packages (per Git tag, per release channel), you create separate repositories. Each one is its own baseurl, its own repodata/, its own S3 prefix.

s3://internal-rpm-repo/el9/stable/
s3://internal-rpm-repo/el9/testing/
s3://internal-rpm-repo/el9/v2.3.1/

Each prefix is a self-contained repository. You point your .repo file at the one you want, run dnf update, and you get exactly the packages from that tag. Upgrades are deterministic, and rollbacks are a config change.

If you’re running a fleet of RHEL-based systems and you’ve been thinking about hosting your own RPM repository, consider skipping the server entirely. S3 (or blob storage in general) does the job, createrepo_c generates the metadata, and an IAM plugin handles auth. The whole thing is a CI pipeline and a bucket.

Articles

​What’s in a .repo file

​How dnf fetches packages

​Generating the metadata

​Authentication

​Versioning with separate repositories

What’s in a `.repo` file

How `dnf` fetches packages

Generating the metadata

Authentication

Versioning with separate repositories