Fixup for https://github.com/spantaleev/matrix-docker-ansible-deploy/pull/3017
This reverts 1cd82cf068 and also multiplies results by `1024`
so as to pass bytes to Synapse, not KB (as done before).
1cd82cf068 was correctly documenting what we were doing (passing KB values),
but that's incorrect.
Synapse's Config Conventions
(https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html#config-conventions)
are supposed to clear it up, but they don't currently state what happens when you pass a plain number (without a unit suffix).
Thankfully, the source code tells us:
bc1db16086/synapse/config/_base.py (L181-L206)
> If an integer is provided it is treated as bytes and is unchanged.
>
> String byte sizes can have a suffix of ...
> No suffix is understood as a plain byte count.
We were previously passing strings, but that has been improved in 3d73ec887a.
Regardless, non-suffixed values seem to be treated as bytes by Synapse,
so this patch changes the variables to use bytes.
Moreover, we're moving from `matrix_synapse_memtotal_kb` to
`matrix_synapse_cache_size_calculations_memtotal_bytes` as working with
the base unit everywhere is preferrable.
Here, we also introduce 2 new variables to allow for the caps to be
tweaked:
- `matrix_synapse_cache_size_calculations_max_cache_memory_usage_cap_bytes`
- `matrix_synapse_cache_size_calculations_target_cache_memory_usage_cap_bytes`
We're casting everything it `int`, but since Jinja templates are
involved, these values end up as strings anyway.
Doing `| int | to_json` is good, but we should only cast numbers to
integer, not empty strings, as that (0) may be interpreted differently
by Synapse.
To turn of auto-tuning, one is possibly supposed to pass empty strings:
> This option defaults to off, enable it by providing values for the sub-options listed below.
It could be that `0` is also considered "no value provided", but I
haven't verified that.
Related to https://github.com/spantaleev/matrix-docker-ansible-deploy/pull/3017
* Modify Synapse Cache Factor to use Auto Tune
Synapse has the ability to as it calls in its config auto tune caches.
This ability lets us set very high cache factors and then instead limit our resource use.
Defaults for this commit are 1/10th of what Element apparently runs for EMS stuff and matrix.org on Cache Factor and upstream documentation defaults for auto tune.
* Add vars to Synapse main.yml to control cache related config
This commit adds various cache related vars to main.yml for Synapse.
Some are auto tune and some are just adding explicit ways to control upstream vars.
* Updated Auto Tune figures
Autotuned figures have been bumped in consultation with other community members as to a reasonable level. Please note these defaults are more on the one of each workers side than they are on the monolith Side.
* Fix YML Error
The playbook is not happy with the previous state of this patch so this commit hopefully fixes it
* Add to_json to various Synapse tuning related configs
* Fix incorrect indication in homeserver.yaml.j2
* Minor cleanups
* Synapse Cache Autotuning Documentation
* Upgrade Synapse Cache Autotune to auto configure memory use
* Update Synapse Tuning docs to reflect automatic memory use configuration
* Fix Linting errors in synapses main.yml
* Rename variables for consistency (matrix_synapse_caches_autotuning_* -> matrix_synapse_cache_autotuning_*)
* Remove FIX ME comment about Synapse's `cache_autotuning`
`docs/maintenance-synapse.md` and `roles/custom/matrix-synapse/defaults/main.yml`
already contains documentation about these variables and the default values we set.
* Improve "Tuning caches and cache autotuning" documentation for Synapse
* Announce larger Synapse caches and cache auto-tuning
---------
Co-authored-by: Slavi Pantaleev <slavi@devture.com>
`matrix_synapse_federation_port_enabled` is defined like this:
```
matrix_synapse_federation_port_enabled: "{{ matrix_synapse_federation_enabled or matrix_synapse_federation_port_openid_resource_required }}"
```
Previously, people that disabled federation, but needed the `openid`
listener were running without these federation-related labels.
In this patch, we're also dropping the `not matrix_synapse_workers_enabled` condition,
because.. none of the Matrix-related labels would be applied anyway when
workers are enabled, thanks to `matrix_synapse_container_labels_matrix_related_labels_enabled`.
Fixes https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/3127
This is a break in backward-compatibility for people disabling
`index.html` creation via the playbook but are managing their static
website files in another way (AUX role, etc).
* Fix s3-storage migrate and shell: container needs attachment to postgres network also
* Connect to s3-storage-provider migrate to multiple networks in multiple steps
Multiple `--network` calls lead to:
> docker: Error response from daemon: Container cannot be connected to network endpoints: NETWORK_1 NETWORK_2.
* Connect to s3-storage-provider shell to multiple networks in multiple steps
---------
Co-authored-by: Slavi Pantaleev <slavi@devture.com>
Most addons live in the same network by default (matrix-addons) right now,
so this network would have usually been created by some other addon.
Howevre, if this is the only addon someone uses, it may have remained
uncreated causing a problem.
I believe `specialized-workers` is a better name than `room-workers`,
because when enabled, 4 different types of specialized workers are
created:
- Room workers
- Sync workers
- Client readers
- Federation readers
Only one of these is called room-workers.
In the future, more specialized workers may be added, making the
`room-workers` preset name an even poorer choice.
Related to https://github.com/spantaleev/matrix-docker-ansible-deploy/pull/3100
Until now, the validation check would only get tripped up
if generic workers are used, combined with at least one EACH
other type of specialized workers.
This means that someone doing this:
```
matrix_synapse_workers_preset: one-of-each
matrix_synapse_workers_client_reader_workers_count: 5
```
.. would not have triggered this safety check.
Related to https://github.com/spantaleev/matrix-docker-ansible-deploy/pull/3100
Hookshot wants a trailing slash for this route.
If we let Hookshot redirect, it goes to `/widgetapi/v1/static/`,
instead of `/hookshot/widgetapi/v1/static/`, so we take this matter into our
own hands.
Public URLs are like: `/hookshot/widgetapi/v1/static/`
.. which get translated to requests for: `/widgetapi/v1/static/`
Previously, we were stripping the whole `/hookshot/widgetapi` prefix,
which is wrong.
Most of these files were defining a service, usually toward the end.
These lines have been moved upward.
Some components (mautrix-signal, mautrix-gmessages, etc.) were defining
a service conditionally (only if metrics are exposed, etc). This was
causing issues like these in the Traefik logs:
> level=error msg="service \"matrix-mautrix-twitter\" error: port is missing" providerName=docker container=matrix-mautrix-twitter-..
This changes the behavior of
`matrix_playbook_migration_matrix_nginx_proxy_uninstallation_enabled`
and is against what we initially described in the changelog entry,
but I've discovered some problems when the `matrix-nginx-proxy` service
and container remain running. They need to go.
After some checking, it seems like there's `/_synapse/client/oidc`,
but no such thing as `/_synapse/oidc`.
I'm not sure why we've been reverse-proxying these paths for so long
(even in as far back as the `matrix-nginx-proxy` days), but it's time we
put a stop to it.
The OIDC docs have been simplified. There's no need to ask people to
expose the useless `/_synapse/oidc` endpoint. OIDC requires
`/_synapse/client/oidc` and `/_synapse/client` is exposed by default
already.
Issues and Pull Requests were not migrated to the new
organization/repository, so `matrix-org/synapse/pull` and
`matrix-org/synapse/issues` references were kept as-is.
`matrix-org/synapse-s3-storage-provider` references were also kept,
as that module still continues living under the `matrix-org` organization.
This patch mainly aims to change documentation-related things, not actual
usage in full yet. For polish that, another more comprehensive patch is coming later.
This moves the comments from being just in Jinja,
to actually ending up in the generated `labels` file,
which makes inspection of the final result easier.
Also, some new lines were added here and there to make labels
more legible.
The generated file may still include weird new-lines due to
various `if` statements yielding content or not, but that's not so ugly
anymore - now that we have proper start/end sections that are visible in
the final `labels` file.
The old variables still work. The global lets us avoid
auto-detection logic like we're currently doing for
`matrix_nginx_proxy_proxy_matrix_federation_api_enabled`.
In the future, we'd just be able to reference
`matrix_homeserver_federation_enabled` and know the up-to-date value
regardless of homeserver.
This was meant to serve as an intermediary for services needing to reach
the homeserver. It was used like that for a while in this
`bye-bye-nginx-proxy` branch, but was never actually public.
It has recently been superseded by homeserver-like services injecting
themselves into a new internal Traefik entrypoint
(see `matrix_playbook_internal_matrix_client_api_traefik_entrypoint_*`),
so `matrix-homeserver-proxy` is no longer necessary.
---
This is probably a good moment to share some benchmarks and reasons
for going with the internal Traefik entrypoint as opposed to this nginx
service.
1. (1400 rps) Directly to Synapse (`ab -n 1000 -c 100 http://matrix-synapse:8008/_matrix/client/versions`
2. (~900 rps) Via `matrix-homeserver-proxy` (nginx) proxying to Synapse (`ab -n 1000 -c 100 http://matrix-homeserver-proxy:8008/_matrix/client/versions`)
3. (~1200 rps) Via the new internal entrypoint of Traefik (`matrix-internal-matrix-client-api`) proxying to Synapse (`ab -n 1000 -c 100 http://matrix-traefik:8008/_matrix/client/versions`)
Besides Traefik being quicker for some reason, there are also other
benefits to not having this `matrix-homeserver-proxy` component:
- we can reuse what we have in terms of labels. Services can register a few extra labels on the new Traefik entrypoint
- we don't need services (like `matrix-media-repo`) to inject custom nginx configs into `matrix-homeserver-proxy`. They just need to register labels, like they do already.
- Traefik seems faster than nginx on this benchmark for some reason, which is a nice bonus
- no need to run one extra container (`matrix-homeserver-proxy`) and execute one extra Ansible role
- no need to maintain a setup where some people run the `matrix-homeserver-proxy` component (because they have route-stealing services like `matrix-media-repo` enabled) and others run an optimized setup without this component and everything needs to be rewired to talk to the homeserver directly. Now, everyone can go through Traefik and we can all run an identical setup
Downsides of the new Traefik entrypoint setup are that:
- all addon services that need to talk to the homeserver now depend on Traefik
- people running their own Traefik setup will be inconvenienced - they
need to manage one additional entrypoint
We'd be adding integration with an internal Traefik entrypoint
(`matrix_playbook_internal_matrix_client_api_traefik_entrypoint`),
so renaming helps disambiguate things.
There's no need for deperecation tasks, because the old names
have only been part of this `bye-bye-nginx-proxy` branch and not used by
anyone publicly.
This is a bit of a compatibility break.
The role was defaulting the Postgres password to `some-password` and we
auto-generate it now.
However, rebuilding both Postgres and this service should unify the
database credentials and the service configs to the new value.
This is an attempt at optimizing service startup.
The effect is most pronounced when many services are restarted one by one.
The systemd service manager role sometimes does this - for example when `just install-service synapse` runs.
In such cases, a 5-second delay for each Synapse worker service
(or other bridge/bot service that waits on the homeserver) quickly adds up to a lot.
When services are all stopped fully and then started, the effect is not so pronounced, because
`matrix-synapse.service` starts first and pulls all worker services (defined as `Wants=` for it).
Later on, when the systemd service manager role "starts" these worker services, they're started already.
Even if they had a 5-second wait each, it would have happened in parallel.