* ci: Cache builds by splitting into two jobs
For the cache to work properly, we need to derive a cache key from the build context (files that affect the Dockerfile build) instead of the cache key changing by commit SHA.
We also need to avoid a test suite failure from preventing the caching of a build, thus splitting into separate jobs.
This first attempt used `upload-artifact` and `download-artifact` to transfer the built image, but it has quite a bit of overhead and prevented multi-platform build (without complicating the workflow further).
* ci: Transfer to dependent job via cache only
While `download-artifact` + `docker load` is a little faster than rebuilding the image from cached layers, `upload-artifact` takes about 2 minutes to upload the AMD64 (330MB) tar image export (likely due to compression during upload?).
The `actions/cache` approach however does not incur that hit and is very quick (<10 secs) to complete it's post upload work. The dependent job still gets a cache-hit, and the build job is able to properly support multi-platform builds.
Added additional notes about timing and size of including ARM builds.
* ci: Move Dockerfile ARG to end of build
When the ARG changes due to commit SHA, it invalidates all cache due to the LABEL layers at the start. Then any RUN layers implicitly invalidate, even when the ARG is not used.
Introduced basic multi-stage build, and relocated the container config / metadata to the end of the build. This avoids invalidating expensive caching layers (size and build time) needlessly.
* tests: Ensure excessive FD limits are avoided
Processes that run as daemons (`postsrsd` and `fail2ban-server`) initialize by closing all FDs (File Descriptors).
This behaviour queries that maximum limit and iterates through the entire range even if only a few FDs are open. In some environments (Docker, limit configured by distro) this can be a range exceeding 1 billion (from kernel default of 1024 soft, 4096 hard), causing an 8 minute delay with heavy CPU activity.
`postsrsd` has since been updated to use `close_range()` syscall, and `fail2ban` will now iterate through `/proc/self/fd` (open FDs) which should resolve the performance hit. Until those updates reach our Docker image, we need to workaround it with `--ulimit` option.
NOTE: If `docker.service` on a distro sets `LimitNOFILE=` to approx 1 million or lower, it should not be an issue. On distros such as Fedora 36, it is `LimitNOFILE=infinity` (approx 1 billion) that causes excessive delays.
* chore: Use Docker host limits instead
Typically on modern distros with systemd, this should equate to 1024 (soft) and 512K (hard) limits. A distro may override the built-in global defaults systemd sets via setting `DefaultLimitNOFILE=` in `/etc/systemd/user.conf` and `/etc/systemd/system.conf`.
* tests(fix): Better prevent non-deterministic failures
- `no_containers.bats` tests the external script `setup.sh` (without `-c`). It's expected that no existing DMS container is running - otherwise it may attempt to use that container and fail. Detect this and fail early via `setup_file()` step.
- `mail_hostname.bats` had a odd timing failure with teardown due to the last tests bringing the containers down earlier (`docker stop` paired with the `docker run --rm`). Adding a moment of delay via `sleep` helps avoid that false positive scenario.
## Quick Summary
Resolves a `TODO` task with `addmailuser`.
## Overview
The main change is adding three new methods in `common.bash`, which replace the completion delay in `addmailuser` / `setup email add` command.
Other than that:
- I swapped `sh -c 'addmailuser ...'` to `setup email add ...`.
- Improved three tests in `setup-cli.bats` for `setup email add|update|del` (_logic remains effectively the same still_).
- Rewrote the `TODO` comment for `setup-cli.bats` test on `setup email del` to better clarify the concern, but the test itself was no longer affected due to changes prior to this PR, so I enabled the commented out assertion.
- Removed unnecessary waits. The two `skip` tests in `test/tests.bats` could be enabled again after this PR.
- Additional fixes to tests were made during the PR (see discussion comments for details), resolving race conditions.
Individual commit messages of the PR provide additional details if helpful.
---
## Relevant commit messages
* chore: Remove creation delay in `addmailuser`
This was apparently only for supporting tests that need to wait on account creation being ready to test against.
As per the removed inline docs, it should be fine to remove once tests are updated to work correctly without it.
* tests(feat): Add two new common helper methods
`wait_until_account_maildir_exists()` provides the same logic `addmailuser` command was carrying, to wait upon the account dir creation in `/var/mail`.
As this was specifically to support tests, it makes more sense as a test method.
`add_mail_account_then_wait_until_ready()` was added to handle the common pattern of creating account and waiting on it. An internal assert will ensure the account was successfully created first during the test before attempting to wait.
* tests(feat): Add common helper for waiting on change event to be processed
The current helper is more complicated for no real benefit, it only detects when a change is made that would trigger a change event in the `changedetector` service. Our usage of this in tests however is only interested in waiting out the completion of the change event.
Remove unnecessary change event waits. These waits should not be necessary if handled correctly.
* tests: `addmailuser` to `add_mail_account_then_wait_until_ready mail()`
This helper method is used where appropriate.
- A password is not relevant (optional).
- We need to wait on the creation on the account (Dovecot and `/var/mail` directory).
* tests: `setup-cli` revise `add`, `update`, `del` tests
The delete test was failing as the `/var/mail` directory did not yet exist.
There is now a proper delay imposed in the `add` test now shares the same account for both `update` and `del` tests resolving that failure.
Additionally tests use better asserts where appropriate and the wait + sleep logic in `add` has been improved (now takes 10 seconds to complete, approx half the time than before).
The `del` test TODO while not technically addressed is no longer relevant due to the tests being switched to `-c` option (there is a separate `no container` test file, but it doesn't provide a `del` test).
* tests(fix): Ensure Postfix is reachable after waiting on ClamAV
There is not much reason to check before waiting on ClamAV.
It is more helpful to debug failures from `nc` mail send commands if we know that nothing went wrong inbetween the ClamAV wait time.
Additionally added an assertion which should provide more information if this part of the test setup fails again.
* tests(fix): Move health check to the top
This test is a bit fragile. It relies on defaults for the healthcheck with intervals of 30 seconds.
If the check occurs while Postfix is down due a change event from earlier tests and the healthcheck kicks in at that point, then if there is not enough time to refresh the health status from `unhealthy`, the test will fail with a false-positive as Postfix is actually working and up again..
* tests(fix): Wait on directory to be removed
Workaround that tries not to introduce heavier delays by waiting on a full change event to complete in the previous `email update` if possible.
There is a chance that the account has the folder deleted, but restored from an active change event (for password update, then the account delete).
The new version uses our `log.sh` helper to simplify logging
significantly. Moreover, the script was adjusted to the current style
and the GitHub workflow was streamlined. The workflow is ot providing
the version anymore (which was useless anyway), and has been compacted.
* outsourcing env variable setup
This commit contains major parts of the work of refactoring the setup
and usage of environment variables. It outsources the setup into its own
script and provides dedicated functions to be executed at a later point in time.
A **new** env variable was added: `USER_PROVISIONG` which provides a
better way of defining which method / protocol to use when it comes to
setting up users. This way, the `ENABLE_LDAP` variable is deprecated,
but all of this is backwards compatible due to a "compatibility layer", a function provided by the new variables script.
This is not a breaking change. It mostly refators internal scripts. The
only change facing the user-side is the deprecation of `ENABLE_LDAP`. We
can prolong the period of deprecation for this variable as long as we
want, because the new function that ensures backwards compatibility
provides a clean interface for the future.
Co-authored-by: Brennan Kinney <5098581+polarathene@users.noreply.github.com>
Co-authored-by: Casper <casperklein@users.noreply.github.com>
When waiting on an account to be added to `postfix-accounts.cf`, Dovecot account creation during the startup process had already run.
Startup continued without properly creating the mail account for Dovecot. Methods like `setup email list` (with `ENABLE_QUOTAS=1`) would fail. `changedetector` service was required to be triggered to re-create Dovecot users.
- Wrapped the logic for wait + shutdown into a function call.
- Moved `_create_accounts()` to bottom of the setup function.
* tests: Update testing submodules (bats-assert, bats-support)
These two submodules were migrated to the `bats-core` organization, where they continued to receive updates.
* tests: Use tagged release of `bats-core/bats-support`
This is technically one commit backwards, but no relevant difference has been made since, other than moving the submodule to the `bats-core` organization.
* tests: Bump `bats-assert` to August 2022 (master)
No official release tag since Nov 2018, but a fair amount of changes since then.
* tests: Bump `bats-core` to `v1.7.0` release
* tests(fix): Correctly use assertions
Some tests were updated as the upgrade of bats submodules had `assert` methods raise awareness of incorrect usage.
This additionally revealed some existing tests that weren't meant to be using `run`, which swallowed failures from surfacing.
See the associated PR for more detailed commentary on specific changes.
### Commands refactored:
- User (**All:** add / list / update / del + _dovecot-master variants_)
- Quota (**All:** set / del)
- Virtual Alias (**All:** add / list /del)
- Relay (**All:** add-relayhost / add-sasl / exclude-domain)
### Overall changes involve:
- **Fairly common structure:**
- `_main` method at the top provides an overview of logical steps:
- After all methods are declared beneath it (_and imported from the new `helpers/database/db.sh`_), the `_main` is called at the bottom of the file.
- `delmailuser` additionally processes option support for `-y` prior to calling `_main`.
- `__usage` is now consistent with each of these commands, along with the `help` command.
- Most logic delegated to new helper scripts. Some duplicate content remains on the basis that it's low-risk to maintenance and avoids less hassle to jump between files to check a single line, usually this is arg validation.
- Error handling should be more consistent, along with var names (_no more `USER`/`EMAIL`/`FULL_EMAIL` to refer to the same expected value_).
- **Three new management scripts** (in `helpers/database/manage/`) using a common structure for managing changes to their respective "Database" config file.
- `postfix-accounts.sh` unified not only add and update commands, but also all the dovecot-master versions, a single password call for all 4 of them, with a 5th consumer of the password prompt from the relay command `addsaslpassword`.
- These scripts delegate actual writes to `helpers/database/db.sh` which provides a common API to support the changes made.
- This is more verbose/complex vs the current inline operations each command currently has, as it provides generic support instead of slightly different variations being maintained, along with handling some edge cases that existed and would lead to bugs (notably substring matches).
- Centralizing changes here seems wiser than scattered about. I've tried to make it easy to grok, hopefully it's not worse than the current situation.
- List operations were kept in their respective commands, `db.sh` is only really managing writes. I didn't see a nice way for removing the code duplication for list commands as the duplication was fairly minimal, especially for `listalias` and `listdovecotmasteruser` which were quite simple in their differences in the loop body.
- `listmailuser` and `delmailuser` also retain methods exclusive to respective commands, I wasn't sure if there was any benefit to move those, but they were refactored.
* fix: Reload `amavisd-new` when vhost config is updated
Amavis was not aware of new domains in `/etc/postfix/vhost` as it did not refresh it's sources upon change detection.
- Inline docs for `check-for-changes.sh` have been shuffled around and revised a bit.
- Change processing extracted from the main change detection loop method to their own methods:
- `_get_changed_files()` - Clarifies what is going on (and how) without having to look it up. To reduce noise in the main logic loop, extracted to a separate method.
- `_postfix_dovecot_changes()` - The bulk of change processing was moved to this method. I've added conditionals to only run relevant logic.
- `_ssl_changes()` - Just shifted, no logic changed. `REGEX_NEVER_MATCH` and `ACME_CERT_DIR` vars scope set to `local`.
* chore: Split vhost helper method and use filepath vars
- Helpers `accounts.sh` and `aliases.sh` can move their vhost code into this helper.
- They share duplicate code with `bin/open-dkim` which will also leverage this vhost helper going forward.
* chore: Sync vhost generation logic into helper
- Chunky commit, but mostly copy/paste of logic into a common method.
- `bin/open-dkim` additionally wrapped relevant logic in a function call and revised inline docs.
* chore: Include LDAP vhost support
- Revises notes for LDAP vhost support.
- This now ensures LDAP users get vhost rebuilt to match the startup script for when change detection support is enabled.
- `bin/open-dkim` will additionally be able to support the default `DOMAINNAME` var (set via `helpers/dns.sh`) for LDAP users instead of requiring them to provide one explicitly.
* chore(`bin/open-dkim`): Ensure `DOMAINNAME` is properly set
- This will ensure LDAP users insert the same `DOMAINNAME` value as used during container startup.
- The container itself should panic at startup (during `helpers/dns.sh`) if this isn't configured correctly already, thus it should not introduce any breaking change to users of this utility?
* chore: Set the 2nd value as blank `_`
Line is split by a delimiter such as white-space (or via IFS: `|`), the blank `_` var is to indicate we're not interested in that value, but still leverage how `read -r` works, instead of splitting the var ourselves first thing.
* chore: Remove shellcheck disable lines
No longer applicable with the switch to `_`
* chore: Remove requirement for `postfix-accounts.cf`
This is an old requirement from when the change detector service was first introduced. It's no longer relevant.
* chore: Do not needlessly create `postfix-aliases.cf`
The config was created regardless to workaround early change detection support. No longer necessary to require the file to exist.
* chore: Drop guards requiring `/tmp/docker-mailserver` to exist
Legacy guards when this was the only location change detection location supported.
There does not appear to be any need for changing into this directory at the start of `check-for-changes.sh` as we use absolute filepaths (originally monitored files were checked with relative paths to this config dir).
* chore: Revise inline docs
* chore: Add change detection monitoring for extra configs
These are also handled at run-time in the current change detection support, so it makes sense to allows these config updates to also trigger change events.
* chore: Create bare new test file `setup-cli.bats`
Bare minimum to setup a new test.
* chore: Transfer over relevant tests
* chore: `mail` container name to dynamic `${TEST_NAME}`
Only applied where it's relevant. Next commit will handle the config path correction.
* chore: Use `TEST_TMP_CONFIG` for referencing local config directory
Could technically use the existing function call. Some paths were using a hard-coded config location.
Both have been converted to `TEST_TMP_CONFIG` and related `grep` calls normalizing the quote mark usage, escaping doesn't seem necessary.
* tests(fix): Create container without providing extra args reference var
If a variable name (of an array) was not provided to reference, this would fail trying to reference `'`.
* chore: Remove `SYS_PTRACE` capability from docs and configs
* chore: Remove `SYS_PTRACE` capability from tests
Doesn't seem to be required. It was originally added when the original change detection feature PR apparently needed it to function.
* chore(`aliases.sh`): Filepath to local var `DATABASE_VIRTUAL`
* chore(`accounts.sh`): Filepath to local var `DATABASE_ACCOUNTS`
* chore(`accounts.sh`): Filepath to local var `DATABASE_VIRTUAL`
* chore(`accounts.sh`): Filepath to local var `DATABASE_DOVECOT_MASTERS`
* chore(`bin/open-dkim`): Filepaths to local vars (accounts,virtual,vhost)
* chore(`relay.sh`): Filepath to local var `DATABASE_SASL_PASSWD`
* chore: Rename method
Prior PR feedback suggested a better helper method name.
* chore: Normalize filtering config lines as input for iterating
* chore: Remove `_is_comment` helper method
No longer serving a purpose with more appropriate filter method for pre-processing the entire config file.
* fix(listmailuser): Don't parse comments
Avoids passing comments to `dovecot_quota_to_hr()` which fails to handle it and would throws errors.
* chore: Move config filter method to `helpers/utils.sh`
This helper was to support an earlier ENV for SASL auth support. When extracting logic into individual helpers, it was assumed this was separate from relay support, which it appears was not the case.
---
The `SASL_PASSWD` ENV is specified in tests but no longer used. There is no `external-domain.com` relay configured or tested against anywhere in the project.
The ENV was likely used in tests prior to improved relay support that allowed for adding more than a single set of relay credentials.
---
It likewise has no real relevance anywhere else outside of `relay.sh` as it's the only portion of code to operate with it.
It's only relevant for SASL auth as an SMTP client, not the SMTP server (`smtpd`) SASL support that is delegated to Dovecot. Functionality has been completely migrated into `relay.sh` as a result.
Documentation is poor for this ENV, it is unlikely in wide use? Should consider for removal.
---
The ENV has been dependent upon `RELAY_HOST` to actually enable postfix to use `/etc/postfix/sasl_passwd`, thus not likely relevant in existing setups?
---
Migrate `/etc/postfix/sasl_passwd` check from `tests.bats` as it belongs to relay tests.
* chore: Fix typo
* chore: Apply explicit chroot default for `sender-cleanup`
The implicit default is set to `y` as a compatibility fallback, but otherwise it is [advised to set to `n` going forward](http://www.postfix.org/COMPATIBILITY_README.html#chroot).
Test was changed to catch any backwards-compatibility logs, not just those for `chroot=y`. `using` added as a prefix to avoid catching log message whenever a setting is changed that the default compatibility level is active.
* chore: Set `compatibility_level` in `main.cf`
We retain the level`2` value previously set via scripts. This avoids log noise that isn't helpful.
Applied review feedback to give maintainers some context with this setting and why we have it presently set to `2`.
Presently relay-host support modifies `main.cf` settings directly. This adjusts the default transport (`smtp`) which other transports in `master.cf` inherit from.
When configuring for implicit TLS to a `relay-host` this would set `main.cf:smtp_tls_wrappermode = yes` and affect the transport `master.cf:smtp-amavis` which does not set an override like it does for `smtp_tls_security_level`. This causes Amavis to fail working which the default transport relies on due to `main.cf:content_filter`.
Easy fix, by explicitly adding the override `smtp_tls_wrappermode=no`.`no` is default in `main.cf` so inheriting this setting hasn't been a problem in the past for other relay-hosts using StartTLS.
* fix: Conditionally add service state
These services will no longer copy over state unless they are enabled.
The biggest offender here was ClamAV as it's database that is baked into the docker image is over 200MB and would copy over to every container instance with a volume mounted state directory.
* chore: Add Dovecot to conditional support
* chore: Extract change-detection method to it's own helper
This doesn't really belong in `helpers/ssl.sh`. Moving to it's own helper script.
* chore: Co-locate related change-detection method from container startup
It seems relevant to migrate the related support during startup for the change detection feature into this helper.
I opted to move the call from `start-mailserver.sh` into the `_setup` call at the end for a more explicit/visible location.
* chore: Move `CHKSUM_FILE` into `helpers/change-detection.sh`
It belongs there, not in `helpers/index.sh`.
* chore: Revise inline documentation
* tests(fix): Ensure correct functionality
Presently `test/test_helper.bats` is using it's own `CHKSUM_FILE` instead of sourcing the var for the filepath.
`test_helper/common.bash` was calling a method to check for changes, but this helper may not correctly detect letsencrypt related changes as these are not ENV rely on, but global vars handled by `helpers/dns.sh`, so that should be run first like it is for `check-for-changes.sh`.
* tests(chore): Use `CHKSUM_FILE` var from helper
* chore: `addmailuser` should use `CHKSUM_FILE` var
* chore: Update `check-for-changes.sh` log message with correct path