We do use some `:latest` images by default for the following services:
- matrix-dimension
- Goofys (in the matrix-synapse role)
- matrix-bridge-appservice-irc
- matrix-bridge-appservice-discord
- matrix-bridge-mautrix-facebook
- matrix-bridge-mautrix-whatsapp
It's terribly unfortunate that those software projects don't release
anything other than `:latest`, but that's how it is for now.
Updating that software requires that users manually do `docker pull`
on the server. The playbook didn't force-repull images that it already
had.
With this patch, it starts doing so. Any image tagged `:latest` will be
force re-pulled by the playbook every time it's executed.
It should be noted that even though we ask the `docker_image` module to
force-pull, it only reports "changed" when it actually pulls something
new. This is nice, because it lets people know exactly when something
gets updated, as opposed to giving the indication that it's always
updating the images (even though it isn't).
Previously, it only mentioned exposing for psql-usage purposes.
Realistically, it can be used for much more. Especially given that
psql can be easily accessed via our matrix-postgres-cli script,
without exposing the container port.
Reasoning is the same as for matrix-org/synapse#5023.
For us, the journal used to contain `docker` for all services, which
is not very helpful when looking at them all together (`journalctl -f`).
Using `docker_container` with a `cap_drop` argument requires
Ansible >=2.7.
We want to support older versions too (2.4), so we either need to
stop invoking it with `cap_drop` (insecure), or just stop using
the module altogether.
Since it was suffering from other bugs too (not deleting containers
on failure), we've decided to remove `docker_container` usage completely.
We run containers as a non-root user (no effective capabilities).
Still, if a setuid binary is available in a container image, it could
potentially be used to give the user the default capabilities that the
container was started with. For Docker, the default set currently is:
- "CAP_CHOWN"
- "CAP_DAC_OVERRIDE"
- "CAP_FSETID"
- "CAP_FOWNER"
- "CAP_MKNOD"
- "CAP_NET_RAW"
- "CAP_SETGID"
- "CAP_SETUID"
- "CAP_SETFCAP"
- "CAP_SETPCAP"
- "CAP_NET_BIND_SERVICE"
- "CAP_SYS_CHROOT"
- "CAP_KILL"
- "CAP_AUDIT_WRITE"
We'd rather prevent such a potential escalation by dropping ALL
capabilities.
The problem is nicely explained here: https://github.com/projectatomic/atomic-site/issues/203
This makes all containers (except mautrix-telegram and
mautrix-whatsapp), start as a non-root user.
We do this, because we don't trust some of the images.
In any case, we'd rather not trust ALL images and avoid giving
`root` access at all. We can't be sure they would drop privileges
or what they might do before they do it.
Because Postfix doesn't support running as non-root,
it had to be replaced by an Exim mail server.
The matrix-nginx-proxy nginx container image is patched up
(by replacing its main configuration) so that it can work as non-root.
It seems like there's no other good image that we can use and that is up-to-date
(https://hub.docker.com/r/nginxinc/nginx-unprivileged is outdated).
Likewise for riot-web (https://hub.docker.com/r/bubuntux/riot-web/),
we patch it up ourselves when starting (replacing the main nginx
configuration).
Ideally, it would be fixed upstream so we can simplify.
If this is a brand new server and Postgres had never started,
detecting it before we even start it is not possible.
This moves the logic, so that it happens later on, when Postgres
would have had the chance to start and possibly initialize
a new empty database.
Fixes#82 (Github issue)
With this change, the following roles are now only dependent
on the minimal `matrix-base` role:
- `matrix-corporal`
- `matrix-coturn`
- `matrix-mailer`
- `matrix-mxisd`
- `matrix-postgres`
- `matrix-riot-web`
- `matrix-synapse`
The `matrix-nginx-proxy` role still does too much and remains
dependent on the others.
Wiring up the various (now-independent) roles happens
via a glue variables file (`group_vars/matrix-servers`).
It's triggered for all hosts in the `matrix-servers` group.
According to Ansible's rules of priority, we have the following
chain of inclusion/overriding now:
- role defaults (mostly empty or good for independent usage)
- playbook glue variables (`group_vars/matrix-servers`)
- inventory host variables (`inventory/host_vars/matrix.<your-domain>`)
All roles default to enabling their main component
(e.g. `matrix_mxisd_enabled: true`, `matrix_riot_web_enabled: true`).
Reasoning: if a role is included in a playbook (especially separately,
in another playbook), it should "work" by default.
Our playbook disables some of those if they are not generally useful
(e.g. `matrix_corporal_enabled: false`).