32 Commits

Author SHA1 Message Date
Slavi Pantaleev
0364c6c634 Suppress old container cleanup (kill/rm) failures
People often report and ask about these "failures".
More-so previously, when the `docker kill/rm` output was collected,
but it still happens now when people do `systemctl status
matrix-something` and notice that it says "FAILURE".

Suppressing to avoid further time being wasted on saying "this is
expected".
2022-04-11 09:05:33 +03:00
Slavi Pantaleev
86c36523df Replace ExecStopPost with ExecStop
Reverts b1b4ba501fdfaa, 90c9801c560b6, a3c84f78ca9c65a, ..

I haven't really traced it (yet), but on some servers, I'm observing
`ansible-playbook ... --tags=start` completing very slowly, waiting
to stop services. I can't reproduce this on all Matrix servers I manage.
I suspect that either the systemd version is to blame or that some
specific service is not responding well to some `docker kill/rm` command.

`ExecStop` seems to work great in all cases and it's what we've been
using for a very long time, so I'm reverting to that.
2022-02-05 12:13:36 +02:00
Slavi Pantaleev
b1b4ba501f Replace ExecStop with ExecStopPost
ExecStopPost should allow us to clean up (docker kill + docker rm)
even if the ExecStart (docker run ..) command failed, and not just after
a graceful service stop was initiated.

Source: https://www.freedesktop.org/software/systemd/man/systemd.service.html#ExecStopPost=
2022-01-04 17:27:25 +02:00
Dan Arnfield
b2ca1f2829 Add capability required by new image 2021-04-19 10:16:26 -05:00
Slavi Pantaleev
fcb9e9618a Make Coturn TLSv1/v1.1 configurable
Related to https://github.com/spantaleev/matrix-docker-ansible-deploy/pull/999
2021-04-16 09:29:32 +03:00
sakkiii
540416e32d
Disable support for TLS 1.0 and TLS 1.1
These old versions of TLS rely on MD5 and SHA-1, both now broken, and contain other flaws. TLS 1.0 is no longer PCI-DSS compliant and the TLS working group has adopted a document to deprecate TLS 1.0 and TLS 1.1.
2021-04-15 19:25:23 +05:30
Hardy Erlinger
f4930d789e Run Let's Encrypt renewal checks daily instead of weekly.
This ensures more timely updates of certifcates.
2021-02-27 21:11:22 +01:00
Slavi Pantaleev
512f42aa76 Do not report docker kill/rm attempts as errors
These are just defensive cleanup tasks that we run.
In the good case, there's nothing to kill or remove, so they trigger an
error like this:

> Error response from daemon: Cannot kill container: something: No such container: something

and:

> Error: No such container: something

People often ask us if this is a problem, so instead of always having to
answer with "no, this is to be expected", we'd rather eliminate it now
and make logs cleaner.

In the event that:
- a container is really stuck and needs cleanup using kill/rm
- and cleanup fails, and we fail to report it because of error
suppression (`2>/dev/null`)

.. we'd still get an error when launching ("container name already in use .."),
so it shouldn't be too hard to investigate.
2021-01-27 10:22:46 +02:00
Slavi Pantaleev
1692a28fe4 Work around annoying Docker warning about undefined $HOME
> WARNING: Error loading config file: .dockercfg: $HOME is not defined

.. which appeared in Docker 20.10.
2021-01-15 00:23:01 +02:00
Slavi Pantaleev
e1690722f7 Replace cronjobs with systemd timers
Fixes https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/756

Related to https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/737

I feel like timers are somewhat more complicated and dirty (compared to
cronjobs), but they come with these benefits:

- log output goes to journald
- on newer systemd distros, you can see when the timer fired, when it
will fire, etc.
- we don't need to rely on cron (reducing our dependencies to just
systemd + Docker)

Cronjobs work well, but it's one more dependency that needs to be
installed. We were even asking people to install it manually
(in `docs/prerequisites.md`), which could have gone unnoticed.

Once in a while someone says "my SSL certificates didn't renew"
and it's likely because they forgot to install a cron daemon.

Switching to systemd timers means that installation is simpler
and more unified.
2021-01-14 23:35:50 +02:00
Slavi Pantaleev
d08b27784f Fix systemd services autostart problem with Docker 20.10
The Docker 19.04 -> 20.10 upgrade contains the following change
in `/usr/lib/systemd/system/docker.service`:

```
-BindsTo=containerd.service
-After=network-online.target firewalld.service containerd.service
+After=network-online.target firewalld.service containerd.service multi-user.target
-Requires=docker.socket
+Requires=docker.socket containerd.service
Wants=network-online.target
```

The `multi-user.target` requirement in `After` seems to be in conflict
with our `WantedBy=multi-user.target` and `After=docker.service` /
`Requires=docker.service` definitions, causing the following error on
startup for all of our systemd services:

> Job matrix-synapse.service/start deleted to break ordering cycle starting with multi-user.target/start

A workaround which appears to work is to add `DefaultDependencies=no`
to all of our services.
2020-12-10 11:43:20 +02:00
Slavi Pantaleev
75f9fde7a4 Remove some more -v usage
Continuation of 1fca917ad13103.

Fixes https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/722
2020-11-25 10:49:59 +02:00
Slavi Pantaleev
2a1ec38e3a Stop using Ansible's cron module
This is mainly to address SSL renewal not working for us due to:
- https://github.com/ansible/ansible/issues/71213
- https://github.com/ansible/ansible/pull/71207

Using the cron module was hacky anyway. We shouldn't need an extra
level of buggy abstraction to manage a cronjob file.
2020-09-06 10:49:19 +03:00
Chris van Dijk
6334f6c1ea Remove hardcoded command paths in systemd unit files
Depending on the distro, common commands like sleep and chown may either
be located in /bin or /usr/bin.

Systemd added path lookup to ExecStart in v239, allowing only the
command name to be put in unit files and not the full path as
historically required. At least Ubuntu 18.04 LTS is however still on
v237 so we should maintain portability for a while longer.
2020-05-27 23:14:54 +02:00
Slavi Pantaleev
9a33e5c7ad Make it possible to control Coturn ports and listen interfaces
Related to #330 (Github Issue).
2019-12-20 12:21:43 +02:00
Slavi Pantaleev
ae7c8d1524 Use SyslogIdentifier to improve logging
Reasoning is the same as for matrix-org/synapse#5023.

For us, the journal used to contain `docker` for all services, which
is not very helpful when looking at them all together (`journalctl -f`).
2019-05-16 09:43:46 +09:00
Hugues De Keyzer
c451025134 Fix indentation in templates
Use Jinja2 lstrip_blocks option in templates to ensure consistent
indentation in generated files.
2019-05-07 21:23:35 +02:00
Sylvia van Os
75b1528d13 Add the possibility to pass extra flags to the docker container 2019-04-30 16:35:18 +02:00
Slavi Pantaleev
59e37105e8 Add TLS support to Coturn 2019-03-19 10:24:39 +02:00
Slavi Pantaleev
018aeed5e9 Add support for mounting additional volumes to matrix-coturn 2019-03-19 09:16:30 +02:00
Slavi Pantaleev
24cf27c60c Isolate Coturn from services in the default Docker network
Most (all?) of our Matrix services are running in the `matrix` network,
so they were safe -- not accessible from Coturn to begin with.

Isolating Coturn into its own network is a security improvement
for people who were starting other services in the default
Docker network. Those services were potentially reachable over the
private Docker network from Coturn.

Discussed in #120 (Github Pull Request)
2019-03-18 17:41:14 +02:00
Stuart Mumford
e367a2d0de
Add nulls for quotas as well 2019-03-18 11:58:52 +00:00
Stuart Mumford
9d236c5466
Add defaults for ips 2019-03-18 11:44:40 +00:00
Stuart Mumford
c0dc56324a
Add config options to turnserver.conf 2019-03-18 11:18:30 +00:00
Slavi Pantaleev
a43bcd81fe Rename some variables 2019-02-28 11:51:09 +02:00
Slavi Pantaleev
08635666df Do not attempt to start coturn TLS listeners
We don't provide certificates, so it fails anyway,
but we'd rather suppress the warnings about it too.
2019-02-07 13:20:30 +02:00
Slavi Pantaleev
0be7b25c64 Make (most) containers run with a read-only filesystem 2019-01-29 18:52:02 +02:00
Slavi Pantaleev
b77b967171 Merge branch 'master' into non-root-containers 2019-01-29 18:00:11 +02:00
Slavi Pantaleev
cbc1cdbbf0 Do not try to load certificates
Seems like we unintentionally removed the mounting of certificates
(the `/matrix-config` mount) as part of splitting the playbook into
roles in 51312b8250d0c394083.

It appears that those certificates weren't necessary for coturn to
funciton though, so we might just get rid of the configuration as well.
2019-01-29 17:56:40 +02:00
Slavi Pantaleev
316d653d3e Drop capabilities in containers
We run containers as a non-root user (no effective capabilities).

Still, if a setuid binary is available in a container image, it could
potentially be used to give the user the default capabilities that the
container was started with. For Docker, the default set currently is:
- "CAP_CHOWN"
- "CAP_DAC_OVERRIDE"
- "CAP_FSETID"
- "CAP_FOWNER"
- "CAP_MKNOD"
- "CAP_NET_RAW"
- "CAP_SETGID"
- "CAP_SETUID"
- "CAP_SETFCAP"
- "CAP_SETPCAP"
- "CAP_NET_BIND_SERVICE"
- "CAP_SYS_CHROOT"
- "CAP_KILL"
- "CAP_AUDIT_WRITE"

We'd rather prevent such a potential escalation by dropping ALL
capabilities.

The problem is nicely explained here: https://github.com/projectatomic/atomic-site/issues/203
2019-01-28 11:22:54 +02:00
Slavi Pantaleev
c10182e5a6 Make roles more independent of one another
With this change, the following roles are now only dependent
on the minimal `matrix-base` role:
- `matrix-corporal`
- `matrix-coturn`
- `matrix-mailer`
- `matrix-mxisd`
- `matrix-postgres`
- `matrix-riot-web`
- `matrix-synapse`

The `matrix-nginx-proxy` role still does too much and remains
dependent on the others.

Wiring up the various (now-independent) roles happens
via a glue variables file (`group_vars/matrix-servers`).
It's triggered for all hosts in the `matrix-servers` group.

According to Ansible's rules of priority, we have the following
chain of inclusion/overriding now:
- role defaults (mostly empty or good for independent usage)
- playbook glue variables (`group_vars/matrix-servers`)
- inventory host variables (`inventory/host_vars/matrix.<your-domain>`)

All roles default to enabling their main component
(e.g. `matrix_mxisd_enabled: true`, `matrix_riot_web_enabled: true`).
Reasoning: if a role is included in a playbook (especially separately,
in another playbook), it should "work" by default.

Our playbook disables some of those if they are not generally useful
(e.g. `matrix_corporal_enabled: false`).
2019-01-16 18:05:48 +02:00
Slavi Pantaleev
51312b8250 Split playbook into multiple roles
As suggested in #63 (Github issue), splitting the
playbook's logic into multiple roles will be beneficial for
maintainability.

This patch realizes this split. Still, some components
affect others, so the roles are not really independent of one
another. For example:
- disabling mxisd (`matrix_mxisd_enabled: false`), causes Synapse
and riot-web to reconfigure themselves with other (public)
Identity servers.

- enabling matrix-corporal (`matrix_corporal_enabled: true`) affects
how reverse-proxying (by `matrix-nginx-proxy`) is done, in order to
put matrix-corporal's gateway server in front of Synapse

We may be able to move away from such dependencies in the future,
at the expense of a more complicated manual configuration, but
it's probably not worth sacrificing the convenience we have now.

As part of this work, the way we do "start components" has been
redone now to use a loop, as suggested in #65 (Github issue).
This should make restarting faster and more reliable.
2019-01-12 18:01:10 +02:00