Commit Graph

32 Commits

Author SHA1 Message Date
ddf18eadc7 More ansible-lint fixes 2022-07-18 13:01:17 +03:00
34cdaade08 Use fully-qualified module names for builtin Ansible modules
Related to https://github.com/spantaleev/matrix-docker-ansible-deploy/pull/1939
2022-07-18 12:58:41 +03:00
e149f33140 add/unify 'Project source code URL' link across all roles 2022-07-16 23:59:21 +03:00
569b52f0c1 Document how the systemd node-exporter collector can be made to work 2022-06-24 08:33:17 +03:00
37d7e75e9b Add support for passing extra arguments to prometheus-node-exporter 2022-06-23 20:37:56 +03:00
ba51997f7b (BC Break) Redo how metrics are exposed to external Prometheus servers 2022-06-23 17:55:07 +03:00
0364c6c634 Suppress old container cleanup (kill/rm) failures
People often report and ask about these "failures".
More-so previously, when the `docker kill/rm` output was collected,
but it still happens now when people do `systemctl status
matrix-something` and notice that it says "FAILURE".

Suppressing to avoid further time being wasted on saying "this is
expected".
2022-04-11 09:05:33 +03:00
2da3768b20 Added retries to the docker pulls (#1701) 2022-03-17 17:37:11 +02:00
819574b8ba Merge branch 'spantaleev:master' into master 2022-02-05 21:37:53 +01:00
7e5b88c3b7 fix: all praise the allmighty yamllinter 2022-02-05 21:32:54 +01:00
86c36523df Replace ExecStopPost with ExecStop
Reverts b1b4ba501f, 90c9801c56, a3c84f78ca, ..

I haven't really traced it (yet), but on some servers, I'm observing
`ansible-playbook ... --tags=start` completing very slowly, waiting
to stop services. I can't reproduce this on all Matrix servers I manage.
I suspect that either the systemd version is to blame or that some
specific service is not responding well to some `docker kill/rm` command.

`ExecStop` seems to work great in all cases and it's what we've been
using for a very long time, so I'm reverting to that.
2022-02-05 12:13:36 +02:00
f2f4d5ba21 Updated: node-exporter to v1.3.1 2022-01-15 18:49:30 +01:00
b1b4ba501f Replace ExecStop with ExecStopPost
ExecStopPost should allow us to clean up (docker kill + docker rm)
even if the ExecStart (docker run ..) command failed, and not just after
a graceful service stop was initiated.

Source: https://www.freedesktop.org/software/systemd/man/systemd.service.html#ExecStopPost=
2022-01-04 17:27:25 +02:00
735c966ab6 Disable systemd services when stopping to uninstall them
Until now, we were leaving services "enabled"
(symlinks in /etc/systemd/system/multi-user.target.wants/).

We clean these up now. Broken symlinks may still exist in older
installations that enabled/disabled services. We're not taking care
to fix these up. It's just a cosmetic defect anyway.
2021-11-10 17:39:21 +02:00
00d1804dd9 prometheus & its exporter updates 2021-08-24 10:24:54 +05:30
f4a9c4dff2 Update prometheus node exporter (1.1.2 -> 1.2.0) 2021-07-22 23:29:43 +05:30
adcecaffaf Fix connectivity between prometheus and prometheus-node-exporter
Expected to have regressed after https://github.com/spantaleev/matrix-docker-ansible-deploy/pull/1008

This patch comes with its own downsides (as described in the comments
for matrix_prometheus_node_exporter_container_http_host_bind_port),
but at least there's:
- no security issue
- metrics remain readable from matrix-prometheus (even if the network metrics are inaccurate)

A better patch is certainly welcome.
2021-04-19 18:29:03 +03:00
sak
88a30fb5ed security** node-exporter data & port publicly exposed 2021-04-19 15:35:23 +05:30
sak
0f9a455719 Revert "security** node-exporter data & port publicly exposed"
This reverts commit d0cd709c08.
2021-04-19 15:24:36 +05:30
sak
d0cd709c08 security** node-exporter data & port publicly exposed 2021-04-19 15:15:59 +05:30
83cc5c9e6a Update prometheus node exporter (1.1.0 -> 1.1.2) 2021-04-16 09:17:04 -05:00
e335f3fc77 rename matrix_global_registry to matrix_container_global_registry_prefix related to #990
Signed-off-by: Ahmad Haghighi <haghighi@fedoraproject.org>
2021-04-12 17:23:55 +04:30
f52a8b6484 use custom docker registry 2021-04-12 17:23:55 +04:30
0585a3ed9f Merge pull request #896 from rakshazi/add_version_to_each_role
added "matrix_%SERVICE%_version" variable to all roles
2021-02-21 12:26:17 +02:00
77ab0d3e98 Do not delete Prometheus/Grafana Docker images
Same reasoning as in 1cd251ed78
2021-02-21 11:14:40 +02:00
2f887f292c added "matrix_%SERVICE%_version" variable to all roles, use it in "matrix_%SERVICE%_docker_image" var (preserving backward-compatibility) 2021-02-20 19:08:28 +02:00
66d5b0e5b9 Do not fail on unrelated validation tasks when Prometheus not enabled
These validation tasks should only run when Prometheus is enabled.
2021-02-12 15:41:15 +02:00
df3dd1c824 Use --read-only FS for metrics-related containers
It seems like it doesn't cause any issues for any of these services.
2021-02-12 11:59:24 +02:00
f0cd294628 Fix matrix-prometheus-node-exporter failure to start
The quotes around "host" for both `--pid` and `--net` were
causing trouble for me:

> docker: --pid: invalid PID mode.

and:

> docker: Error response from daemon: network "host" not found.

I've also changed the `-v` call to `--mount` for consistency with the
rest of the playbook.
2021-02-12 11:59:24 +02:00
fde222a041 Update Prometheus Node Exporter 1.0.1 => 1.1.0 2021-02-10 23:11:17 +01:00
76d7e84be5 Make prometheus-node-exporter a bit more capable
By running it in a more privileged container with access to the host network stack and such
2021-02-10 22:54:14 +01:00
e525970b39 Prometheus Node Exporter
Basic system stats, to show stuff the synapse metrics
can't show such as resource usage by bridges, etc

Seems to work fine as well.

This too has only been tested on debian amd64 so far
2021-02-10 22:54:14 +01:00