Commit Graph

10857 Commits

Author SHA1 Message Date
Slavi Pantaleev
9accc848c4 Wire conditional restart for Traefik and update setup-all to force restarts
- Traefik's service list entry now uses the `traefik_restart_necessary`
  variable (computed by the Traefik role) instead of hardcoded `true`,
  so it is only restarted when its config, systemd unit, or image changed.

- `just setup-all` now passes
  `devture_systemd_service_manager_conditional_restart_enabled=false`
  to force unconditional restarts, matching its "full setup" semantics.

- Document the conditional restart behavior in docs/just.md.

Some benchmarks follow for `just install-service traefik -l matrix.example.com`
when Traefik settings did not change and a restart is not really necessary:

- Before:
  - total time: 56 seconds 🐌
  - Traefik restarted: yes 
  - Services that depend on Traefik restarted: yes; all of them restarted 

- After:
  - total time: 27 seconds 
  - Traefik restarted: no 
  - Services that depend on Traefik restarted: no; none restarted 

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 12:32:02 +02:00
Slavi Pantaleev
af193043ab Upgrade Traefik (v2.0.0-2 -> v3.0.0-0) - adding support for conditional restarting 2026-02-13 12:32:02 +02:00
Slavi Pantaleev
452d54b53f Upgrade Traefik (v3.6.8-2 -> v3.6.8-3) - adding support for conditional restarting 2026-02-13 12:32:02 +02:00
renovate[bot]
f954df4707 chore(deps): update dependency python to 3.14 2026-02-13 11:41:35 +02:00
Suguru Hirahara
eea7d15158 Add GitHub Action "Update translations" (#3907) 2026-02-13 11:29:36 +02:00
renovate[bot]
17894ef70b chore(deps): update dependency postgres to v18.2-0 2026-02-13 11:24:52 +02:00
renovate[bot]
7b41de4eb1 chore(deps): update matrixconduit/matrix-conduit docker tag to v0.10.12 2026-02-13 07:10:03 +02:00
renovate[bot]
409c7393a0 chore(deps): update ghcr.io/element-hq/synapse docker tag to v1.147.1 2026-02-12 20:12:35 +02:00
Suguru Hirahara
a4c40979d2 Remove Dimension (#4916)
* Remove roles/custom/matrix-dimension

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Remove mentions to Dimension

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Update configuring-playbook-dimension.md

Reuse 0f5015a33c/docs/configuring-playbook-bridge-mx-puppet-twitter.md

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Update validate_config.yml

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Update CHANGELOG.md

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

---------

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>
Co-authored-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>
2026-02-12 20:05:51 +02:00
Suguru Hirahara
0f5015a33c Merge pull request #4915 from luixxiul/hydrogen
Relocate Hydrogen to MASH organization
2026-02-12 15:00:21 +02:00
Slavi Pantaleev
47bf99af7a Merge pull request #4914 from krejcar25/fix/matrix_synapse_wait_seconds_type
Fix regression introduced in a77a875
2026-02-12 12:31:03 +02:00
Slavi Pantaleev
0b5ef18d1c Upgrade systemd_service_manager (v2.0.0-1 -> v2.0.0-2) 2026-02-12 09:41:19 +02:00
Amélie-Laura Lilith Krejčí
81b90a7089 Fix regression introduced in a77a875
matrix_synapse_systemd_service_post_start_delay_seconds is assigned a string value, and setup fails while creating the service file. It is impossible to compare str and int.
2026-02-12 02:26:44 +01:00
Slavi Pantaleev
014380eecd Upgrade Traefik (v3.6.8-1 -> v3.6.8-2) 2026-02-12 01:04:06 +02:00
Slavi Pantaleev
a77a8753d9 Derive Synapse post-start delay from Traefik's providersThrottleDuration
After Synapse's systemd health check passes, Traefik still needs
providers.providersThrottleDuration to register routes. Derive the
post-start delay from this setting (+1s for healthcheck polling gap)
instead of using a hardcoded value. Defaults to 0 when no Traefik
reverse proxy is used.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 00:54:46 +02:00
Slavi Pantaleev
9569633164 Upgrade Traefik (v3.6.8-0 -> v3.6.8-1) 2026-02-12 00:48:13 +02:00
Slavi Pantaleev
9d9e9e9177 Use docker inspect for Synapse systemd health check and lower health interval
Switch the systemd ExecStartPost health check from docker exec + curl
to polling docker inspect for container health status. This piggybacks
on the container image's built-in HEALTHCHECK instead of duplicating it.

Also add a configurable container health interval (5s for Traefik setups,
15s otherwise) to speed up startup readiness detection without affecting
non-Traefik deployments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 00:13:02 +02:00
Slavi Pantaleev
bcddeda5df Make traefik-certs-dumper require the Traefik service to avoid race condition
When both services restart simultaneously (e.g. in all-at-once mode),
Traefik may momentarily truncate or reinitialize acme.json, causing
the certs dumper to read an empty file and panic. By adding
Requires/After on the Traefik service, the certs dumper only starts
after Traefik is fully ready and acme.json is stable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 00:11:28 +02:00
Slavi Pantaleev
59e70b8ca9 Add systemd-healthcheck to Synapse systemd service in an effort to increase reliability (of Synapse-dependant services)
Previously, we had a 10-second magical delay.

Now we first do a healthcheck to figure out when it really is up.
Then, we do the same 10-second magical delay to account for the time it
may take for a reverse-proxy (like Traefik) to pick up Synapse's routes.
2026-02-11 23:32:33 +02:00
Slavi Pantaleev
f8815c0bb9 Upgrade systemd_service_manager (v2.0.0-0 -> v2.0.0-1) 2026-02-11 23:31:13 +02:00
Slavi Pantaleev
2fad873b42 Make addon systemd services depend on the homeserver systemd service as well, not just on Traefik
Addons typically access the homeserver via Traefik, but requests
ultimately lead to the homeserver and it'd better be up or Traefik would
serve a "404 Not Found" error.

This is an attempt (one of many pieces) to make services more reliable,
especially when `devture_systemd_service_manager_service_restart_mode: all-at-once` is used
(which is the default).
2026-02-11 23:27:09 +02:00
Slavi Pantaleev
294cd109fd Upgrade Traefik (v3.6.7-1 -> v3.6.8-0) 2026-02-11 23:26:13 +02:00
Slavi Pantaleev
9d6c8eabcb Fix swapped Requires=/Wants= directives in Draupnir and Mjolnir systemd service templates
Commit 593b3157b ("Fix systemd service Wants for mjolnir and draupnir")
accidentally swapped the variable loops: `systemd_wanted_services_list`
ended up generating `Requires=`/`After=` directives and
`systemd_required_services_list` ended up generating `Wants=` directives —
the opposite of what the variable names mean and how every other
bot/bridge service template in the playbook works.

This caused these bots to only `Wants=` (not `Requires=`/`After=`) their
dependencies like matrix-traefik.service, so systemd didn't guarantee
ordering. During all-at-once restarts, the bots would start before traefik
was ready, fail with DNS resolution errors, and crash.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 18:54:41 +02:00
Slavi Pantaleev
dd26f8a12a Add systemd dependencies to s3-storage-provider-migrate service
The migrate service now declares Requires/After on matrix-synapse.service,
ensuring Synapse (and its transitive dependencies like Postgres and Docker)
are running before the migration triggers.
2026-02-11 16:50:29 +02:00
Suguru Hirahara
7b7b6feb5b Relocate coturn to MASH project (#4906)
* Fetch ansible-role-coturn from MASH project

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Replace "matrix_coturn" with "coturn"

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Replace "custom/matrix-coturn" with "galaxy/coturn"

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Set `coturn_identifier`

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Move `coturn_base_path` to matrix_servers for the playbook

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Set `coturn_uid` and `coturn_gid`

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Set empty value to `coturn_turn_external_ip_address_auto_detection_echoip_service_url` on main.yml

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Replace `coturn_docker_image_*`

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Move `coturn_container_image_registry_prefix` to matrix_servers

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Replace "matrix-coturn" with "coturn" on matrix_servers

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Replace "matrix-coturn" with "coturn"

Keep "matrix-coturn" on documentation as-is, since it is specified so with `coturn_identifier`.

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Remove roles/custom/matrix-coturn

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Update CHANGELOG.md

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

---------

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>
Co-authored-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>
Co-authored-by: Slavi Pantaleev <slavi@devture.com>
2026-02-11 15:06:27 +02:00
Suguru Hirahara
fa7b784c5b Remove conduwuit (#4913) 2026-02-11 15:03:56 +02:00
renovate[bot]
15ba65f235 chore(deps): update docker.io/metio/matrix-alertmanager-receiver docker tag to v2026.2.11 2026-02-11 11:07:09 +02:00
Aine
4ec41c0b42 Merge pull request #4909 from spantaleev/renovate/ghcr.io-element-hq-element-web-1.x
chore(deps): update ghcr.io/element-hq/element-web docker tag to v1.12.10
2026-02-10 18:11:36 +00:00
renovate[bot]
0a08126324 chore(deps): update ghcr.io/element-hq/element-web docker tag to v1.12.10 2026-02-10 17:49:29 +00:00
renovate[bot]
482ef0fdf5 chore(deps): update ghcr.io/element-hq/synapse docker tag to v1.147.0 2026-02-10 16:51:46 +02:00
renovate[bot]
ca356c52e2 chore(deps): update ghcr.io/element-hq/matrix-authentication-service docker tag to v1.11.0 2026-02-10 16:51:09 +02:00
Slavi Pantaleev
ecf9befc32 Adapt to the all-at-once restart mode default in systemd_service_manager v2.0.0-0
- `install-service` no longer forces `one-by-one` restart mode

- the coturn priority condition is flipped: only `one-by-one` mode
  needs the delayed priority (1500); all other modes (including
  the new `all-at-once` default) use the normal priority (900)

Ref:

- d42cd92045
- f3e658cca3/docs/restart-mode-comparison.md
- 36445fb419
- 750cb7e29e
2026-02-10 16:41:41 +02:00
Slavi Pantaleev
750cb7e29e Upgrade systemd_service_manager (v1.1.0-0 -> v2.0.0-0) 2026-02-10 16:21:57 +02:00
Suguru Hirahara
815b9baec6 Update notes about self-hosting services with the MASH playbook
Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>
2026-02-10 22:31:11 +09:00
Suguru Hirahara
1dcd4636ff Add a note about self-hosting echoip with the MASH playbook
Reuse 3653f9f89b/docs/configuring-playbook-ssl-certificates.md

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>
2026-02-10 22:28:08 +09:00
renovate[bot]
7f04231904 chore(deps): update ghcr.io/etkecc/baibot docker tag to v1.14.1 2026-02-10 15:18:02 +02:00
renovate[bot]
b0828528df chore(deps): update dependency ntfy to v2.17.0-0 2026-02-10 11:53:33 +02:00
Suguru Hirahara
96029bf916 Replace "EchoIP" with "echoip"
cf. https://github.com/mpolden/echoip

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>
2026-02-10 17:41:52 +09:00
Slavi Pantaleev
ace086056f Upgrade Postgres (v18.1-4 -> v18.1-5) 2026-02-09 21:24:48 +02:00
Slavi Pantaleev
0e8ef8ef10 Add retry logic for Synapse user registration on Connection refused
When DB credentials change (derived from matrix_synapse_macaroon_secret_key),
a running Synapse container may fail to connect to its database and stop
serving requests. This causes register_new_matrix_user to fail with
"Connection refused" when the matrix-user-creator role tries to register users.

This extends the retry logic from 44b43a51b (which handled HMAC failures)
to also handle Connection refused errors: restart Synapse (picking up the
new config with updated credentials), wait for it to start, and retry.

Caused by c21a80d232

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 17:36:59 +02:00
Slavi Pantaleev
2c2738a48f Remove passlib dependency by making matrix-media-repo datastore IDs user-provided
These IDs were incorrectly auto-derived from matrix_homeserver_generic_secret_key,
which is meant for secrets that are OK to change. Datastore IDs are static
identifiers that must never change after first use.

The playbook now requires users to explicitly set matrix_media_repo_datastore_file_id
(and matrix_media_repo_datastore_s3_id when S3 is enabled) in vars.yml, with
validation that fails early if they are missing.

This was the last usage of passlib, which is now removed from prerequisites.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 16:56:51 +02:00
Suguru Hirahara
09914bf338 Set ddclient_uid and ddclient_gid
Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>
2026-02-09 19:49:59 +09:00
Slavi Pantaleev
44b43a51b9 Add retry logic for Synapse user registration on HMAC failure
When the registration_shared_secret changes (derived from
matrix_synapse_macaroon_secret_key), a running Synapse container still
has the old secret in its config. This causes register_new_matrix_user
to fail with "HMAC incorrect" when the matrix-user-creator role tries
to register users.

This mirrors the approach from 2a581cce (which added similar retry
logic for the Matrix Authentication Service on database auth failure):
if the initial registration attempt fails with an HMAC error, restart
Synapse (picking up the new config with the updated secret), wait for
it to start, and retry.

Caused by c21a80d232

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 06:29:14 +02:00
Aine
5f8235f44a remove Zulip bridge 2026-02-08 20:34:56 +02:00
renovate[bot]
eb393b4eb8 chore(deps): update dependency setuptools to v82 2026-02-08 19:46:32 +02:00
Slavi Pantaleev
92c204394a Upgrade Postgres (v18.1-3 -> v18.1-4) 2026-02-08 18:46:36 +02:00
Slavi Pantaleev
a1015b6df2 Change salt for Whatsapp token secrets to make pre-commit happy 2026-02-08 18:43:10 +02:00
Slavi Pantaleev
2a581cce62 Add retry logic for MAS user registration on database auth failure
When the Postgres role updates database passwords (e.g., due to a
change in the secret derivation method), the Matrix Authentication
Service container may still be running with old configuration that
references the previous password. This causes mas-cli to fail with
"password authentication failed" when the matrix-user-creator role
tries to register users.

Rather than adding config-change detection or eager restarts to the
MAS role, this adds targeted retry logic: if the initial registration
attempt fails with a database authentication error, restart the MAS
service (which picks up the new config with the updated password),
wait for it to start, and retry. The restart usually only triggers
once per run since subsequent user registrations succeed after the restart.

Related to c21a80d232

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 18:32:20 +02:00
Slavi Pantaleev
c21a80d232 Switch to fast single-round hashing for derived secrets
Replace password_hash('sha512', rounds=655555) with hash('sha512')
for all 114 secret derivations in group_vars/matrix_servers.

The old method (655k rounds of SHA-512) was designed for protecting
low-entropy human passwords in /etc/shadow. For deriving secrets
from a high-entropy secret key, a single hash round is equally
secure - the security comes from the key's entropy, not the
computational cost. SHA-512 remains preimage-resistant regardless
of rounds.

This yields a major performance improvement: evaluating
postgres_managed_databases (which references multiple derived
database passwords) dropped from ~10.7s to ~0.6s on a fast mini
PC. The Postgres role evaluates this variable multiple times, and
other roles reference derived passwords too, so the cumulative
savings across a full playbook run are substantial.

All derived service passwords (database passwords, appservice
tokens, etc.) will change on the next run. The main/superuser
database password is not affected (it's hardcoded in inventory
variables). All services receive their new passwords in the same
run, so this should be seamless.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 18:15:02 +02:00
Suguru Hirahara
baa740fcda Relocate ddclient role to MASH organization (#4902)
* Fetch ansible-role-ddclient from MASH project

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Replace `matrix_dynamic_dns` with `ddclient`

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Set `matrix-dynamic-dns` to `ddclient_identifier`

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Remove `ddclient_container_network` in favor of the role's configuration

On the role the value of `ddclient_container_network` is set to `ddclient_identifier`, which is set to `matrix-dynamic-dns` on the playbook.

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Replace `matrix-dynamic-dns` with `ddclient` on matrix_servers

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Replace `ddclient_docker_image_*` with `ddclient_container_image_*`

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Update `ddclient_container_image_*`

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Move `ddclient_base_path` to matrix_servers

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Move `ddclient_web_*` to matrix_servers

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Remove `matrix-dynamic-dns` directory

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Update configuring-playbook-dynamic-dns.md

Reuse 75e264f538/docs/services/ddclient.md

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

* Fix a typo

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>

---------

Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>
Co-authored-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>
2026-02-08 16:34:35 +02:00