Add change-tracking and restart_necessary computation for:
- matrix-authentication-service (custom role in this repo)
- container-socket-proxy, traefik-certs-dumper, postgres, exim-relay,
cinny, livekit-server (external roles, bumped in requirements.yml)
Wire all 7 services in group_vars to use their _restart_necessary variable
instead of hardcoded true.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Track config/image/systemd changes via register: directives and compute
a _restart_necessary variable for each service role, allowing the
systemd_service_manager to skip unnecessary restarts during install-* runs.
Covers 22 service roles: alertmanager-receiver, appservice-draupnir-for-all,
bridge-mautrix-wsproxy (+ syncproxy), cactus-comments, cactus-comments-client,
corporal, element-admin, ldap-registration-proxy, livekit-jwt-service, matrixto,
pantalaimon, prometheus-nginxlog-exporter, rageshake, registration, static-files,
sygnal, synapse-admin, synapse-auto-compressor, synapse-reverse-proxy-companion,
synapse-usage-exporter, and user-verification-service.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
For each of the 34 roles (3 clients, 9 bots, 22 bridges), this commit:
- Adds `_restart_necessary: false` default variable
- Adds `register:` directives to config/image/systemd tasks
- Computes `_restart_necessary` via set_fact (OR of all .changed results)
- Wires `(_restart_necessary | bool)` in group_vars/matrix_servers
This allows the systemd service manager to skip unnecessary restarts
when running install-* tags and nothing actually changed.
Service roles and complex multi-service roles will follow separately.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These variables track whether a database migration necessitates a service
restart. The new name avoids confusion with the conditional restart
feature introduced in af193043/9accc848/4a8df138, where
devture_systemd_service_manager handles restarting services whose
configuration or image changed. The old _requires_restart name was
ambiguous — it could be mistaken for the systemd_service_manager
mechanism — so _migration_requires_restart makes the purpose explicit.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Override devture_systemd_service_manager_conditional_restart_enabled in
group_vars based on ansible_run_tags: disabled when setup-* tags are used,
enabled otherwise. This replaces the --extra-vars hack in the justfile and
ensures consistent behavior for both `just` and raw `ansible-playbook` users.
- Revert justfile setup-all to its original form (no --extra-vars needed).
- Update docs/just.md to reflect tag-agnostic behavior.
- Add CHANGELOG.md entry documenting the conditional restart feature.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Traefik's service list entry now uses the `traefik_restart_necessary`
variable (computed by the Traefik role) instead of hardcoded `true`,
so it is only restarted when its config, systemd unit, or image changed.
- `just setup-all` now passes
`devture_systemd_service_manager_conditional_restart_enabled=false`
to force unconditional restarts, matching its "full setup" semantics.
- Document the conditional restart behavior in docs/just.md.
Some benchmarks follow for `just install-service traefik -l matrix.example.com`
when Traefik settings did not change and a restart is not really necessary:
- Before:
- total time: 56 seconds 🐌
- Traefik restarted: yes ❌
- Services that depend on Traefik restarted: yes; all of them restarted ❌
- After:
- total time: 27 seconds ⚡
- Traefik restarted: no ✅
- Services that depend on Traefik restarted: no; none restarted ✅
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
matrix_synapse_systemd_service_post_start_delay_seconds is assigned a string value, and setup fails while creating the service file. It is impossible to compare str and int.
After Synapse's systemd health check passes, Traefik still needs
providers.providersThrottleDuration to register routes. Derive the
post-start delay from this setting (+1s for healthcheck polling gap)
instead of using a hardcoded value. Defaults to 0 when no Traefik
reverse proxy is used.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Switch the systemd ExecStartPost health check from docker exec + curl
to polling docker inspect for container health status. This piggybacks
on the container image's built-in HEALTHCHECK instead of duplicating it.
Also add a configurable container health interval (5s for Traefik setups,
15s otherwise) to speed up startup readiness detection without affecting
non-Traefik deployments.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When both services restart simultaneously (e.g. in all-at-once mode),
Traefik may momentarily truncate or reinitialize acme.json, causing
the certs dumper to read an empty file and panic. By adding
Requires/After on the Traefik service, the certs dumper only starts
after Traefik is fully ready and acme.json is stable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously, we had a 10-second magical delay.
Now we first do a healthcheck to figure out when it really is up.
Then, we do the same 10-second magical delay to account for the time it
may take for a reverse-proxy (like Traefik) to pick up Synapse's routes.
Addons typically access the homeserver via Traefik, but requests
ultimately lead to the homeserver and it'd better be up or Traefik would
serve a "404 Not Found" error.
This is an attempt (one of many pieces) to make services more reliable,
especially when `devture_systemd_service_manager_service_restart_mode: all-at-once` is used
(which is the default).
Commit 593b3157b ("Fix systemd service Wants for mjolnir and draupnir")
accidentally swapped the variable loops: `systemd_wanted_services_list`
ended up generating `Requires=`/`After=` directives and
`systemd_required_services_list` ended up generating `Wants=` directives —
the opposite of what the variable names mean and how every other
bot/bridge service template in the playbook works.
This caused these bots to only `Wants=` (not `Requires=`/`After=`) their
dependencies like matrix-traefik.service, so systemd didn't guarantee
ordering. During all-at-once restarts, the bots would start before traefik
was ready, fail with DNS resolution errors, and crash.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The migrate service now declares Requires/After on matrix-synapse.service,
ensuring Synapse (and its transitive dependencies like Postgres and Docker)
are running before the migration triggers.
- `install-service` no longer forces `one-by-one` restart mode
- the coturn priority condition is flipped: only `one-by-one` mode
needs the delayed priority (1500); all other modes (including
the new `all-at-once` default) use the normal priority (900)
Ref:
- d42cd92045
- f3e658cca3/docs/restart-mode-comparison.md
- 36445fb419
- 750cb7e29e
When DB credentials change (derived from matrix_synapse_macaroon_secret_key),
a running Synapse container may fail to connect to its database and stop
serving requests. This causes register_new_matrix_user to fail with
"Connection refused" when the matrix-user-creator role tries to register users.
This extends the retry logic from 44b43a51b (which handled HMAC failures)
to also handle Connection refused errors: restart Synapse (picking up the
new config with updated credentials), wait for it to start, and retry.
Caused by c21a80d232
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These IDs were incorrectly auto-derived from matrix_homeserver_generic_secret_key,
which is meant for secrets that are OK to change. Datastore IDs are static
identifiers that must never change after first use.
The playbook now requires users to explicitly set matrix_media_repo_datastore_file_id
(and matrix_media_repo_datastore_s3_id when S3 is enabled) in vars.yml, with
validation that fails early if they are missing.
This was the last usage of passlib, which is now removed from prerequisites.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>