Reverse Proxies

In the last post, we explored the idea of grouping and isolating related containers into separate networks.

Isolation of services is inherently part of working with Docker, and a major reason why it’s so successful. Savvy admins cian use this to their advantage, reducing the attack surface of mission-critical applications. If you can control how traffic gets into your containers, you minimize the ability for compromise.

If you are operating an Internet-facing server, odds are good that you’re exposing more than one service.

Take my personal server, for example. It runs the following web services:

The trouble with these services is that they all operate on ports 80 & 443. I will cover networking in a future post (TO-DO REMINDER), but for now please remember that port 80 is HTTP (unsecured) and port 443 is HTTPS (secured). Both protocols and ports are standard across the Internet.

For a multi-service application like this, you have two solutions to the port conflict problem:

  • Use non-standard ports for all services and append a :port to the end of each service’s URL. For example, a second server on example.com might be accessed at example.com:81, and the third at example.com:82. Not a sustainable solution, and some poorly-programmed applications don’t work on ports other than 80 and 443!
  • Use a reverse proxy

This post is about reverse proxies, so you probably know my preference.

What is a Reverse Proxy?

To understand reverse proxies, first we must understand what a proxy is. A proxy server acts as a middleman for communication to and from a network. If you’ve joined any sufficiently large Wi-Fi network at a University, Airport, or Hotel, you likely accessed the Internet via a proxy server. Instead of allowing you to communicate directly with the Internet, the proxy server receives and forwards your traffic (if allowed) on to the Internet. When the Internet traffic returns, it is received and passed back to you by the proxy server.

If a proxy is used to control traffic from users inside the network, perhaps a reverse proxy is used to control traffic from users outside the network?

That’s a bingo!

A reverse proxy sits in front of your web-accessible applications, filters and rejects appropriately, then forwards approved traffic to the appropriate service running on your machine.

This simplifies several frustrating aspects of running multiple services:

  • The reverse proxy can listen on standard ports (80 & 443), eliminating the need to juggle non-standard ports for your applications.
  • The reverse proxy can manage the SSL certificates associated with HTTPS (secured) traffic on port 443. Not all applications can handle SSL traffic natively, so offloading this task to a dedicated reverse proxy will improve your administration overhead.
  • Using a reverse proxy allows you to manage subdomains in a single place. A subdomain is a text string appended to a top level URL. Example: for nextcloud.example.com, nextcloud is a subdomain of example.com.

Reverse Proxy Options

There are many reverse proxies to choose from. The most commonly used standalone reverse proxy is nginx, the very same web server we used in the Docker introduction post. It’s a very robust tool! Using nginx, however, requires a bit of hand-editing to configure initially. Not terrible, but I prefer to automate everything wherever possible. Additionally, nginx cannot manage the renewal of SSL certificates.

Another popular choice is HAProxy.

However there is a superior choice if you’re already using Docker to a significant degree. Traefik has the tightest integration with Docker of all the reverse proxies, and is shipped as a Docker container itself.

It’s a bit mind-bending to think about, but here is how Traefik works:

  • A Docker service is created with several labels (remember these from last post?) applied to it. These labels are read by the Traefik container and routing rules are automatically created.
  • Traefik is created inside Docker, where it starts and reads the labels from all other containers. For all containers with appropriately-defined labels, Traefik listens for appropriate requests and forwards it to the appropriate container.
  • Traefik does not the Docker container to have a forwarded port to send requests to it. It only needs to be part of that container’s network.
  • Traefik can terminate HTTPS requests using SSL certificates that you provide, or SSL certificates that it manages and renews automatically.
  • As a cherry on top, for a fully Docker-native stack Traefik can be fully configured within your docker-compose.yml file. No need to attach a separate configuration file or hand-configure anything once your docker-compose.yml stack is made. Copy it to another computer and start it up, no fuss no muss. When we get to version control and infrastructure as code (TO-DO REMINDER) this will a huge timesaver.

Damn, good stuff!

Docker Traefik Example

In my previous post I included an example for the stack that runs this blog (nginx, matomo, mariaDB). I ignored the labels section, but you can clearly see many with the traefik prefix.

Let’s explore! Here is it again, for reference:

version: "2"

services:
  web:
    image: nginx:alpine
    restart: unless-stopped
    volumes:
      - /blog/bowtieddevil/public:/usr/share/nginx/html/
    labels:
      - traefik.enable=true
      - traefik.http.routers.bowtieddevil-web.entrypoints=websecure
      - traefik.http.routers.bowtieddevil-web.rule=Host(`bowtieddevil.com`) || Host(`www.bowtieddevil.com`)
      - traefik.http.services.bowtieddevil-web.loadbalancer.server.port=80

  web-staging:
    image: nginx:alpine
    restart: unless-stopped
    volumes:
      - /blog/staging/public:/usr/share/nginx/html/
    labels:
      - traefik.enable=true
      - traefik.http.routers.bowtieddevil-web-staging.entrypoints=websecure
      - traefik.http.routers.bowtieddevil-web-staging.rule=Host(`staging.bowtieddevil.com`)
      - traefik.http.services.bowtieddevil-web-staging.loadbalancer.server.port=80

  stats:
    image: matomo:4
    restart: unless-stopped
    volumes:
      - stats:/var/www/html
    env_file:
      - stats.env
    depends_on:
      - stats-db
    labels:
      - traefik.enable=true
      - traefik.http.routers.bowtieddevil-stats.entrypoints=websecure
      - traefik.http.routers.bowtieddevil-stats.rule=Host(`stats.bowtieddevil.com`)
      - traefik.http.services.bowtieddevil-stats.loadbalancer.server.port=80

  stats-db:
    image: mariadb:10
    command: --max-allowed-packet=64MB
    restart: unless-stopped
    volumes:
      - stats-db:/var/lib/mysql
    env_file:
      - stats-db.env

volumes:
  stats:
  stats-db:

networks:
  default:
    name: bowtieddevil

I also have a smaller docker-compose.yml in /docker/traefik that defines the Traefik container.

It looks like this:

version: "2"

services:
  traefik:
    image: traefik:v2.4
    container_name: traefik
    restart: unless-stopped
    env_file:
      - ./traefik.env
    command:
      - --providers.docker=true
      - --providers.docker.exposedbydefault=false
      - --certificatesresolvers.letsencrypt.acme.dnschallenge=true
      - --certificatesresolvers.letsencrypt.acme.dnschallenge.provider=linode
      - --certificatesresolvers.letsencrypt.acme.email=[redacted]
      - --certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json
      - --entrypoints.web.address=:80
      - --entrypoints.web.http.redirections.entrypoint.to=websecure
      - --entrypoints.web.http.redirections.entrypoint.scheme=https
      - --entrypoints.websecure.address=:443
      - --entrypoints.websecure.forwardedHeaders.insecure=true
      - --entrypoints.websecure.http.tls=true
      - --entrypoints.websecure.http.tls.certResolver=letsencrypt
      - --entrypoints.websecure.http.tls.domains[0].main=bowtieddevil.com
      - --entrypoints.websecure.http.tls.domains[0].sans=*.bowtieddevil.com
    ports:
      - "80:80/tcp"
      - "443:443/tcp"
    networks:
      - bowtieddevil
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - certs:/letsencrypt

volumes:
  certs:

networks:
  bowtieddevil:
    external: true

Since I am hosting this blog on Linode, I am using the linode provider for the acme.dnschallenge. This particular item will request and update wildcard SSL certificates for the bowtieddevil.com domain name, as well as *.bowtieddevil.com. The use of a wildcard SSL certificate means that I can request a certificate once and use it for any subdomain. This is only possible if I have full control over my DNS. We will review DNS at a later date.

In the volumes section, you can see that I have attached a special file called /var/run/docker.sock in read-only mode (:ro). This allows Traefik to read status and events from the Docker socket. A socket is a special file that allows processes to communicate using read-write on a file instead of over a network connection. Not very important to understand now, but I mention it for completeness.

Within the bowtieddevil docker-compose.yml, you see four repeated labels. These are:

  • traefik.enable
  • traefik.http.routers.[name].entrypoints
  • traefik.http.routers.[name].rule
  • traefik.http.services.[name].server.port

These four labels allow me to enable requests to be forwarded from Traefik, the name of the entrypoint (web on port 80, websecure on port 443), a rule to select appropriate requests (usually with a Host text string), and finally the port to forward the requests to inside the container.

The traefik.env file contains a special API key to access the DNS records through Linode. I will cover this once we get into the lesson on VPS (Virtual Private Servers).

I’ve created a volume called certs to allow the SSL certificates to be saved between container restarts and upgrades.

Traefik listens on ports 80 and 443 for requests, and that’s it! None of the downstream containers are exposed to Internet traffic directly, and are inaccessible from the host OS itself. This prevents a lot of malicious attacks from a non-root user who might manage to get into my system.

Finally, I attach the Traefik container to the bowtieddevil network. This ensures it can reach my other service containers.

All Done!

The advantages of a reverse proxy in general are clear, and the advantages of a Traefik proxy in particular are shown above. I utilize Traefik heavily and will often include docker-compose.yml files with Traefik labels.

I have barely scratched the surface of what Traefik can do, so feel free to read up on the documentation if you want to learn some of the really intricate things it can do.

Newsletter

See also