Docker Swarm Management

The Architectural Scenario

I manage a docker swarm setup for moderate number of customer environments (less than a thousand). Each environment is 6-7 services, typically with a single replica task for each service. The makeup of each environment is relatively typical LAMP type architecture: Apache for front end and api, worker services for continuous queue management, cron for periodic tasks, redis for cache, and a database service for persistent storage. The build process for all of this could be a conversation on it’s own.

Docker Swarm Challenges

Deploying a dozen stacks to docker swarm is a pretty straightforward process. Most tutorials and documentation I’ve seen online are geared towards that scenario. Larger environments present challenges that most docs don’t account for. Namely:

Native docker routing
Distribution of docker volumes
Limitations of docker networks

Docker Swarm Routing Challenges

By default, you can point web traffic to the manager of a docker swarm, and it will route requests appropriately. There are a handful of ways to do this. The end result is always the same: very slow.

The best solution I’ve found is Traefik. Traefik has been around for a while and is well mature enough for production systems. Traefik allows for automatic routing to stacks based on hostname. This is a tremendous benefit. The routing is extremely fast and effective.

Docker Volume Challenges

Docker allows for volume definitions to be shared across services. This is greatly useful for files being shared between services. If you’ve read this far, I presume you can think of countless examples of why this would be useful.

The problem with native docker volumes though, is that each volume is not shared across nodes. A volume with important data in node A, doesn’t share that information with a volume understood in node B.

This makes volumes in docker swarm almost useless. The only real benefit is for data persistence across restarts of stacks/services. That’s a limited value.

The solution most offered is NFS. Vendors/specifics may vary on this one. I’ve used a hybrid approach for this.

Because volumes in database servers use heavy IO for operations, I only use volumes in docker to persist data across restarts. Heavy IO traffic across NFS systems is notoriously unreliable. That risk isn’t worth the effort. Database services are deploy-contrained to specific workers, which allows for restarts without data loss.

This solution has risks though. A docker system prune can easily wipe out all database data if a stack is down. Frequent offsite backups are recommended.

For data that’s required to be shared across services I utilize NFS and bind mounts defined with docker. This scenarios is less IO intensive and much less prone to error.

The NFS volumes I use are provided by AWS EFS provisioned file systems. It’s not cheap. This isn’t a solution for local dev setups or learning. However, this solution definitely works.

Docker Swarm Network Challenges

This has easily been the most difficult challenge of all. By default, docker bridge ingress networking is a C class network. That means, there are 255 available IP addresses. Take away a few IPs that are consumed by docker management needs, and you don’t have a lot to go on.

The solution here has been a mix of externally managed overlay networks and Traefik. This Github post about docker is probably going to live in my memory for life.

This solution requires some manual management of docker networks, as well as manual configuration of Traefik to account for each of these networks.

In the scenario I described at the introduction, you can be pretty liberal about the number of networks you create. I currently use 20 and that’s enough to handle a few hundred stacks with a half dozen or so services in each of them.

Docker Instrumentation

Finally, keeping track of all of these environments is too much work to manage manually. Systems must be in place to provide autonomous monitoring. For this, I use Prometheus and Grafana. There are a wealth of write ups on how to do this well. No need to rewrite what’s already been well discussed.

The point to stress is that something must be setup to monitor things. There are a lot of unknowable unknowns in a setup like this. Having a prompt alert when something is amiss helps the administrator fix issues before customers realize they’re happening.

Code Examples

All of the talk previously can be summarized with a few code examples. First, here’s an example of the Traefik docker stack file:

version: '3'

services:
  reverse-proxy:
    image: traefik:v2.9
    env_file: .env
    command:
      - "--providers.docker.endpoint=unix:///var/run/docker.sock"
      - "--providers.docker.swarmMode=true"
      - "--providers.docker.exposedbydefault=true"
      - "--providers.docker.network=traefik-public"
      - "--providers.docker.network=traefik-traefik-public-1"
      - "--providers.docker.network=traefik-traefik-public-2"
      - "--providers.docker.network=traefik-traefik-public-20"
      - "--api.dashboard=true"
      - "--api.insecure=true"
      - "--entrypoints.web.address=:80"
      - "--metrics.prometheus=true"
      - "--metrics.prometheus.addEntryPointsLabels=true"
      - "--metrics.prometheus.addrouterslabels=true"
      - "--entryPoints.metrics.address=:8082"
      - "--metrics.prometheus.entryPoint=metrics"
    environment:
      - TZ=US/Chicago
    ports:
      - 80:80
      - 8083:8080
      - 8082:8082
    volumes:
      # So that Traefik can listen to the Docker events
      - /var/run/docker.sock:/var/run/docker.sock:ro
    networks:
      - traefik-public
      - traefik-public-1
      - traefik-public-2
      - traefik-public-20
    logging:
      driver: "json-file"
      options:
        max-size: "50k"
    deploy:
      mode: replicated
      replicas: 3
      placement:
        constraints: [node.role == manager]
      labels:
        - "traefik.enable=true"
        - "traefik.http.services.traefik.loadbalancer.server.port=888" # required by swarm but not used.
        - "traefik.http.routers.traefik.rule=Host(`traefik.example.com`)"
        - "traefik.http.routers.traefik.entrypoints=traefik"
        - "traefik.http.routers.traefik.service=api@internal"
        - "traefik.http.routers.traefik.middlewares=traefik-auth"
        - "traefik.http.middlewares.traefik-auth.basicauth.users=public-admin:the-hash"

networks:
  traefik-public:
    external: true

  traefik-public-1:
    external: true

  traefik-public-2:
    external: true

  traefik-public-20:
    external: true

I didn’t include all of the networks explicitly here. 1, 2, … 20. You get the idea.

Here is an example of what a typical environment stack file would look like:

version: "3.9"

services:
  # redis, for short term key/value storing
  redis:
    image: redis
    networks:
      - default

  # The database instance
  db:
    env_file: .env
    image: ECR/db:tag
    ports:
      - target: 3306
        published: 10102
        protocol: tcp
        mode: host
    volumes:
      - dbtmp:/tmp
      - dbdata:/var/lib/mysql
    networks:
      - default
    deploy:
      placement:
        # because volumes don't share across nodes, assign the database to a specific worker node
        constraints: [node.hostname == worker-N.example.internal.aws]

  # The cron instance, it holds all of the cronjobs
  cron:
    env_file: .env
    image: ECR/cron:tag
    command: ["cron", "-f"]
    volumes:
      - dbtmp:/tmp
    networks:
      - default
    deploy:
      placement:
        constraints: [node.role == worker]

  # The worker instance, for supervisor
  worker:
    env_file: .env
    image: ECR/worker:tag
    networks:
      - default
    volumes:
      - dbtmp:/tmp
    deploy:
      placement:
        constraints: [node.role == worker]

  # The API
  api:
    env_file: .env
    image: ECR/api:tag
    networks:
      - traefik-public-12
      - default
    deploy:
      placement:
        constraints: [node.role == worker]
      labels:
        - "traefik.enable=true"
        - "traefik.docker.network=traefik-public-12"
        - "traefik.http.routers.cory_api.rule=Host(`cory-api.example.com`)"
        - "traefik.http.routers.cory_api.entrypoints=web"
        - "traefik.http.services.cory_api.loadbalancer.server.port=80"

  # The web app
  web:
    env_file: .env
    image: ECR/web:tag
    networks:
      - traefik-public-12
      - default
    deploy:
      placement:
        constraints: [node.role == worker]
      labels:
        - "traefik.enable=true"
        - "traefik.docker.network=traefik-public-12"
        - "traefik.http.routers.cory_web.rule=Host(`cory.example.com`)"
        - "traefik.http.routers.cory_web.entrypoints=web"
        - "traefik.http.services.cory_web.loadbalancer.server.port=80"

# define the volumes
volumes:
  dbtmp:
    driver: local
    driver_opts:
      o: bind
      device: /mnt/dbtmp/cory
      type: none
  dbdata:

networks:
  default:
  traefik-public-12:
    external: true

Notice that only the publicly available services utilize Traefik. Namely, the web service and the api service. The others do not. So, for any given external network (i.e. traefik-public-6), there should only be 2 ip addresses allocated per environment. This allows for dozens of environments to share the same public network.

Docker Host Networking

You might have noticed in the environment swarm file, that the db service specified both a deployment constraint, and a very specific port setting. A couple things are going on here.

As previously discusssed, persistent data in database servers matters. Assigning the database service to a worker node allows for data to remain across restarts.

Another concern is being addressed as well. Many modern web applications require metrics and data warehousing by external sources. These sources require a consistent connection endpoint. By using host networking on these database services, and exposing a specific port, we allow for things like SAP, Redshift, Opensearch, etc to connect and pull what data it requires.

We could have solved this a few other ways, but this way prevents us from adding more services we need to maintain.

To Briefly Summarize

There are a lot of issues with docker, out of the box, when trying to use it en leiu of a more thought out orchestration tool like Kubernetes. Those issues can be addressed. There are other solutions than the ones discussed here to address the common problems administrators face when using swarm as the orchestration tool of choice.

However, these are the techniques that worked for me. I haven’t written much in a long time. The struggles I faced would have been a lot easier if there was a single place to take concerns into account. I hope this post helps you. Good luck out there.