Reverts b1b4ba501f, 90c9801c56, a3c84f78ca, ..
I haven't really traced it (yet), but on some servers, I'm observing
`ansible-playbook ... --tags=start` completing very slowly, waiting
to stop services. I can't reproduce this on all Matrix servers I manage.
I suspect that either the systemd version is to blame or that some
specific service is not responding well to some `docker kill/rm` command.
`ExecStop` seems to work great in all cases and it's what we've been
using for a very long time, so I'm reverting to that.
Until now, we were leaving services "enabled"
(symlinks in /etc/systemd/system/multi-user.target.wants/).
We clean these up now. Broken symlinks may still exist in older
installations that enabled/disabled services. We're not taking care
to fix these up. It's just a cosmetic defect anyway.
These are just defensive cleanup tasks that we run.
In the good case, there's nothing to kill or remove, so they trigger an
error like this:
> Error response from daemon: Cannot kill container: something: No such container: something
and:
> Error: No such container: something
People often ask us if this is a problem, so instead of always having to
answer with "no, this is to be expected", we'd rather eliminate it now
and make logs cleaner.
In the event that:
- a container is really stuck and needs cleanup using kill/rm
- and cleanup fails, and we fail to report it because of error
suppression (`2>/dev/null`)
.. we'd still get an error when launching ("container name already in use .."),
so it shouldn't be too hard to investigate.
Revert "Correct inabillity for appservice-discord to connect"
This reverts commit 673e19f830.
While certain things do work even with such a local URL, sending
messages leads to an error like this:
> [DiscordBot] verbose: DiscordAPIError: Invalid Form Body
> avatar_url: Not a well formed URL.
Fixes https://github.com/Half-Shot/matrix-appservice-discord/issues/649
The sample configuration file for appservice-discord
c29cfc72f5/config/config.sample.yaml (L8)
explicitly says that we need a public URL.
I was thinking that it makes sense to be more specific,
and using `_postgres_` also separated these variables
from the `_database_` variables that ended up in bridge configuration.
However, @jdreichmann makes a good point
(https://github.com/spantaleev/matrix-docker-ansible-deploy/pull/740#discussion_r542281102)
that we don't need to be so specific and can allow for other engines (like MySQL) to use these variables.
Instead of passing the connection string, we can now pass a name of a
variable, which contains a connection string.
Both are supported for having extra flexibility.
People can toggle between them now. The playbook also defaults
to using SQLite if an external Postgres server is used.
Ideally, we'd be able to create databases/users in external Postgres
servers as well, but our initialization logic (and `docker run` command,
etc.) hardcode too many things right now.
If a service is enabled, a database for it is created in postgres with a uniqque password. The service can then use this database for data storage instead of relying on sqlite.
The Docker 19.04 -> 20.10 upgrade contains the following change
in `/usr/lib/systemd/system/docker.service`:
```
-BindsTo=containerd.service
-After=network-online.target firewalld.service containerd.service
+After=network-online.target firewalld.service containerd.service multi-user.target
-Requires=docker.socket
+Requires=docker.socket containerd.service
Wants=network-online.target
```
The `multi-user.target` requirement in `After` seems to be in conflict
with our `WantedBy=multi-user.target` and `After=docker.service` /
`Requires=docker.service` definitions, causing the following error on
startup for all of our systemd services:
> Job matrix-synapse.service/start deleted to break ordering cycle starting with multi-user.target/start
A workaround which appears to work is to add `DefaultDependencies=no`
to all of our services.
After recently updating my matrix-docker-ansible-deploy installation, matrix-appservice-discord would refuse to start, logging ECONNREFUSED to https://matrix.[mydomain]:443, which was resolving to 172.18.0.2 due to the `--hostname` in mailer grabbing that hostname.
Curious why the IRC bridge didn't have this issue, I looked into it, and it was connecting to `http://matrix-synapse:8008`. Correcting this one to that URL resolved the issue.
This may be a bit premature, because the bridge didn't work for me
the last time I tried it (RC3).
Some bugs have been fixed to make our config compatible with v1.0.0
though, so it may work for some people (especially those starting
fresh).
I'm not for shipping potentially broken things, but given that we were
using `docker.io/halfshot/matrix-appservice-discord:latest` and that
points to v1.0.0 already (with no other tag we can use), our setup was
already broken in any case.
Now, at least it has some chance of running.
Depending on the distro, common commands like sleep and chown may either
be located in /bin or /usr/bin.
Systemd added path lookup to ExecStart in v239, allowing only the
command name to be put in unit files and not the full path as
historically required. At least Ubuntu 18.04 LTS is however still on
v237 so we should maintain portability for a while longer.
Discord client IDs are numeric (e.g. 12345).
Passing them as integers however, causes the Discord bridge's YAML parser
to parse them as integers and its config schema validation will fail.
Fixes#240 (Github Issue)
Looks like these client ids are actually integers,
but unless we pass them as a string, the bridge would complain with
an error like:
{"field":"data.auth.clientID","message":"is the wrong type","value":123456789012345678,"type":"string","schemaPath":["properties","auth","properties","clientID"]}
Explicitly-casting to a string should fix the problem.
The Discord bridge should probably be improved to handle both ints and
strings though.
Well, `config.yaml` has been playbook-managed for a long time.
It's now extended to match the default sample config of the Discord
bridge.
With this patch, we also make `registration.yaml` playbook-managed,
which leads us to consistency with all other bridges.
Along with that, we introduce `./config` and `./data` separation,
like we do for the other bridges.
Until now, if `--tags=setup-synapse` was used, bridge tasks would not
run and bridges would fail to register with the `matrix-synapse` role.
This means that Synapse's configuration would be generated with an empty
list of appservices (`app_service_config_files: []`).
.. and then bridges would fail, because Synapse would not be aware of
there being any bridges.
From now on, bridges always run their init tasks and always register
with Synapse.
For the Telegram bridge, the same applies to registering with
matrix-nginx-proxy. Previously, running `--tags=setup-nginx-proxy` would
get rid of the Telegram endpoint configuration for the same reason.
Not anymore.
We do use some `:latest` images by default for the following services:
- matrix-dimension
- Goofys (in the matrix-synapse role)
- matrix-bridge-appservice-irc
- matrix-bridge-appservice-discord
- matrix-bridge-mautrix-facebook
- matrix-bridge-mautrix-whatsapp
It's terribly unfortunate that those software projects don't release
anything other than `:latest`, but that's how it is for now.
Updating that software requires that users manually do `docker pull`
on the server. The playbook didn't force-repull images that it already
had.
With this patch, it starts doing so. Any image tagged `:latest` will be
force re-pulled by the playbook every time it's executed.
It should be noted that even though we ask the `docker_image` module to
force-pull, it only reports "changed" when it actually pulls something
new. This is nice, because it lets people know exactly when something
gets updated, as opposed to giving the indication that it's always
updating the images (even though it isn't).
Bridges start matrix-synapse.service as a dependency, but
Synapse is sometimes slow to start, while bridges are quick to
hit it and die (if unavailable).
They'll auto-restart later, but .. this still breaks `--tags=start`,
which doesn't wait long enough for such a restart to happen.
This attempts to slow down bridge startup enough to ensure Synapse
is up and no failures happen at all.