The Architectural Scenario
I manage a docker swarm setup for moderate number of customer environments (less than a thousand). Each environment is 6-7 services, typically with a single replica task for each service. The makeup of each environment is relatively typical LAMP type architecture: Apache for front end and api, worker services for continuous queue management, cron for periodic tasks, redis for cache, and a database service for persistent storage. The build process for all of this could be a conversation on it’s own.
Docker Swarm Challenges
Deploying a dozen stacks to docker swarm is a pretty straightforward process. Most tutorials and documentation I’ve seen online are geared towards that scenario. Larger environments present challenges that most docs don’t account for. Namely:
- Native docker routing
- Distribution of docker volumes
- Limitations of docker networks
Docker Swarm Routing Challenges
By default, you can point web traffic to the manager of a docker swarm, and it will route requests appropriately. There are a handful of ways to do this. The end result is always the same: very slow.
The best solution I’ve found is Traefik. Traefik has been around for a while and is well mature enough for production systems. Traefik allows for automatic routing to stacks based on hostname. This is a tremendous benefit. The routing is extremely fast and effective.
Docker Volume Challenges
Docker allows for volume definitions to be shared across services. This is greatly useful for files being shared between services. If you’ve read this far, I presume you can think of countless examples of why this would be useful.
The problem with native docker volumes though, is that each volume is not shared across nodes. A volume with important data in node A, doesn’t share that information with a volume understood in node B.
This makes volumes in docker swarm almost useless. The only real benefit is for data persistence across restarts of stacks/services. That’s a limited value.
The solution most offered is NFS. Vendors/specifics may vary on this one. I’ve used a hybrid approach for this.
Because volumes in database servers use heavy IO for operations, I only use volumes in docker to persist data across restarts. Heavy IO traffic across NFS systems is notoriously unreliable. That risk isn’t worth the effort. Database services are deploy-contrained to specific workers, which allows for restarts without data loss.
This solution has risks though. A docker system prune can easily wipe out all database data if a stack is down. Frequent offsite backups are recommended.
For data that’s required to be shared across services I utilize NFS and bind mounts defined with docker. This scenarios is less IO intensive and much less prone to error.
The NFS volumes I use are provided by AWS EFS provisioned file systems. It’s not cheap. This isn’t a solution for local dev setups or learning. However, this solution definitely works.
Docker Swarm Network Challenges
This has easily been the most difficult challenge of all. By default, docker bridge ingress networking is a C class network. That means, there are 255 available IP addresses. Take away a few IPs that are consumed by docker management needs, and you don’t have a lot to go on.
The solution here has been a mix of externally managed overlay networks and Traefik. This Github post about docker is probably going to live in my memory for life.
This solution requires some manual management of docker networks, as well as manual configuration of Traefik to account for each of these networks.
In the scenario I described at the introduction, you can be pretty liberal about the number of networks you create. I currently use 20 and that’s enough to handle a few hundred stacks with a half dozen or so services in each of them.
Docker Instrumentation
Finally, keeping track of all of these environments is too much work to manage manually. Systems must be in place to provide autonomous monitoring. For this, I use Prometheus and Grafana. There are a wealth of write ups on how to do this well. No need to rewrite what’s already been well discussed.
The point to stress is that something must be setup to monitor things. There are a lot of unknowable unknowns in a setup like this. Having a prompt alert when something is amiss helps the administrator fix issues before customers realize they’re happening.
Code Examples
All of the talk previously can be summarized with a few code examples. First, here’s an example of the Traefik docker stack file:
version: '3' services: reverse-proxy: image: traefik:v2.9 env_file: .env command: - "--providers.docker.endpoint=unix:///var/run/docker.sock" - "--providers.docker.swarmMode=true" - "--providers.docker.exposedbydefault=true" - "--providers.docker.network=traefik-public" - "--providers.docker.network=traefik-traefik-public-1" - "--providers.docker.network=traefik-traefik-public-2" - "--providers.docker.network=traefik-traefik-public-20" - "--api.dashboard=true" - "--api.insecure=true" - "--entrypoints.web.address=:80" - "--metrics.prometheus=true" - "--metrics.prometheus.addEntryPointsLabels=true" - "--metrics.prometheus.addrouterslabels=true" - "--entryPoints.metrics.address=:8082" - "--metrics.prometheus.entryPoint=metrics" environment: - TZ=US/Chicago ports: - 80:80 - 8083:8080 - 8082:8082 volumes: # So that Traefik can listen to the Docker events - /var/run/docker.sock:/var/run/docker.sock:ro networks: - traefik-public - traefik-public-1 - traefik-public-2 - traefik-public-20 logging: driver: "json-file" options: max-size: "50k" deploy: mode: replicated replicas: 3 placement: constraints: [node.role == manager] labels: - "traefik.enable=true" - "traefik.http.services.traefik.loadbalancer.server.port=888" # required by swarm but not used. - "traefik.http.routers.traefik.rule=Host(`traefik.example.com`)" - "traefik.http.routers.traefik.entrypoints=traefik" - "traefik.http.routers.traefik.service=api@internal" - "traefik.http.routers.traefik.middlewares=traefik-auth" - "traefik.http.middlewares.traefik-auth.basicauth.users=public-admin:the-hash" networks: traefik-public: external: true traefik-public-1: external: true traefik-public-2: external: true traefik-public-20: external: true
I didn’t include all of the networks explicitly here. 1, 2, … 20. You get the idea.
Here is an example of what a typical environment stack file would look like:
version: "3.9" services: # redis, for short term key/value storing redis: image: redis networks: - default # The database instance db: env_file: .env image: ECR/db:tag ports: - target: 3306 published: 10102 protocol: tcp mode: host volumes: - dbtmp:/tmp - dbdata:/var/lib/mysql networks: - default deploy: placement: # because volumes don't share across nodes, assign the database to a specific worker node constraints: [node.hostname == worker-N.example.internal.aws] # The cron instance, it holds all of the cronjobs cron: env_file: .env image: ECR/cron:tag command: ["cron", "-f"] volumes: - dbtmp:/tmp networks: - default deploy: placement: constraints: [node.role == worker] # The worker instance, for supervisor worker: env_file: .env image: ECR/worker:tag networks: - default volumes: - dbtmp:/tmp deploy: placement: constraints: [node.role == worker] # The API api: env_file: .env image: ECR/api:tag networks: - traefik-public-12 - default deploy: placement: constraints: [node.role == worker] labels: - "traefik.enable=true" - "traefik.docker.network=traefik-public-12" - "traefik.http.routers.cory_api.rule=Host(`cory-api.example.com`)" - "traefik.http.routers.cory_api.entrypoints=web" - "traefik.http.services.cory_api.loadbalancer.server.port=80" # The web app web: env_file: .env image: ECR/web:tag networks: - traefik-public-12 - default deploy: placement: constraints: [node.role == worker] labels: - "traefik.enable=true" - "traefik.docker.network=traefik-public-12" - "traefik.http.routers.cory_web.rule=Host(`cory.example.com`)" - "traefik.http.routers.cory_web.entrypoints=web" - "traefik.http.services.cory_web.loadbalancer.server.port=80" # define the volumes volumes: dbtmp: driver: local driver_opts: o: bind device: /mnt/dbtmp/cory type: none dbdata: networks: default: traefik-public-12: external: true
Notice that only the publicly available services utilize Traefik. Namely, the web service and the api service. The others do not. So, for any given external network (i.e. traefik-public-6), there should only be 2 ip addresses allocated per environment. This allows for dozens of environments to share the same public network.
Docker Host Networking
You might have noticed in the environment swarm file, that the db
service specified both a deployment constraint, and a very specific port setting. A couple things are going on here.
As previously discusssed, persistent data in database servers matters. Assigning the database service to a worker node allows for data to remain across restarts.
Another concern is being addressed as well. Many modern web applications require metrics and data warehousing by external sources. These sources require a consistent connection endpoint. By using host networking on these database services, and exposing a specific port, we allow for things like SAP, Redshift, Opensearch, etc to connect and pull what data it requires.
We could have solved this a few other ways, but this way prevents us from adding more services we need to maintain.
To Briefly Summarize
There are a lot of issues with docker, out of the box, when trying to use it en leiu of a more thought out orchestration tool like Kubernetes. Those issues can be addressed. There are other solutions than the ones discussed here to address the common problems administrators face when using swarm as the orchestration tool of choice.
However, these are the techniques that worked for me. I haven’t written much in a long time. The struggles I faced would have been a lot easier if there was a single place to take concerns into account. I hope this post helps you. Good luck out there.