add dovecot-fts-xapian (#2064)

* add dovecot-fts-xapian

update Docker to build from debian bullseye slim, as it contains
packages for fts-xapian.

update Docker to install dovecot-fts-xapian.

update docs with instructions on how to enable fts-xapian or fts-solr
and what considerations to take into when deciding.

* address review feedback

* update backport method to previously proposed approach (which was lost in a forced push)
This commit is contained in:
eleith 2021-07-05 03:25:26 -07:00 committed by GitHub
parent 9f47d04dde
commit 4473b881cf
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 114 additions and 9 deletions

View file

@ -39,6 +39,8 @@ SHELL ["/bin/bash", "-o", "pipefail", "-c"]
# -----------------------------------------------
RUN \
# Backport repo for dovecot-fts-xapian package. This can be removed once Debian 11 is used as base image.
echo 'deb http://deb.debian.org/debian buster-backports main' > /etc/apt/sources.list.d/buster-backports.list && \
apt-get -qq update && \
apt-get -qq install apt-utils 2>/dev/null && \
apt-get -qq dist-upgrade && \
@ -47,9 +49,9 @@ RUN \
# A - D
altermime amavisd-new apt-transport-https arj binutils bzip2 bsd-mailx \
ca-certificates cabextract clamav clamav-daemon cpio curl \
dbconfig-no-thanks dovecot-core dovecot-imapd dovecot-ldap \
dovecot-lmtpd dovecot-managesieved dovecot-pop3d dovecot-sieve \
dovecot-solr dumb-init \
dbconfig-no-thanks dovecot-core dovecot-fts-xapian dovecot-imapd \
dovecot-ldap dovecot-lmtpd dovecot-managesieved dovecot-pop3d \
dovecot-sieve dovecot-solr dumb-init \
# E - O
ed fetchmail file gamin gnupg gzip iproute2 iptables \
locales logwatch lhasa libdate-manip-perl liblz4-tool \

View file

@ -4,11 +4,115 @@ title: 'Advanced | Full-Text Search'
## Overview
Full-text search allows all messages to be indexed, so that mail clients can quickly and efficiently search messages by their full text content.
Full-text search allows all messages to be indexed, so that mail clients can quickly and efficiently search messages by their full text content. Dovecot supports a variety of community supported [FTS indexing backends](https://doc.dovecot.org/configuration_manual/fts/).
Docker-mailserver comes pre-installed with two plugins that can be enabled with a dovecot config file.
Please be aware that indexing consumes memory and takes up additional disk space.
### Xapian
The [dovecot-fts-xapian](https://github.com/grosjo/fts-xapian) plugin makes use of [Xapian](https://xapian.org/). Xapian enables embedding an FTS engine without the need for additional backends.
The indexes will be stored as a subfolder named `xapian-indexes` inside your `mail` folder. With the default settings, 10GB of email data may generate around 4GB of indexed data.
While indexing is memory intensive, you can configure the plugin to limit the amount of memory consumed by the index workers. With Xapian being small and fast, this plugin is a good choice for low memory environments (2GB) as compared to Solr.
#### Setup
1. To configure fts-xapian as a dovecot plugin, create a `fts-xapian-plugin.conf` file and place the following in it:
```
mail_plugins = $mail_plugins fts fts_xapian
plugin {
fts = xapian
fts_xapian = partial=3 full=20 verbose=0
fts_autoindex = yes
fts_enforced = yes
# disable indexing of folders
# fts_autoindex_exclude = \Trash
# Index attachements
# fts_decoder = decode2text
}
service indexer-worker {
# limit size of indexer-worker RAM usage, ex: 512MB, 1GB, 2GB
vsz_limit = 1GB
}
# service decode2text {
# executable = script /usr/libexec/dovecot/decode2text.sh
# user = dovecot
# unix_listener decode2text {
# mode = 0666
# }
# }
```
adjust the settings to tune for your desired memory limits, exclude folders and enable searching text inside of attachments
2. Update `docker-compose.yml` to load the previously created dovecot plugin config file:
```yaml
version: '3.8'
services:
mailserver:
image: mailserver/docker-mailserver:latest
hostname: mail
domainname: example.com
container_name: mailserver
env_file: mailserver.env
ports:
- "25:25" # SMTP (explicit TLS => STARTTLS)
- "143:143" # IMAP4 (explicit TLS => STARTTLS)
- "465:465" # ESMTP (implicit TLS)
- "587:587" # ESMTP (explicit TLS => STARTTLS)
- "993:993" # IMAP4 (implicit TLS)
volumes:
- ./data/mail:/var/mail
- ./data/state:/var/mail-state
- ./data/logs:/var/log/mail
- /etc/localtime:/etc/localtime:ro
- ./config/:/tmp/docker-mailserver/
- ./fts-xapian-plugin.conf:/etc/dovecot/conf.d/10-plugin.conf:ro
restart: always
stop_grace_period: 1m
cap_add: [ "NET_ADMIN", "SYS_PTRACE" ]
```
3. Recreate containers:
```
docker-compose down
docker-compose up -d
```
4. Initialize indexing on all users for all mail:
```
docker-compose exec mailserver doveadm index -A -q \*
```
5. Run the following command in a daily cron job:
```
docker-compose exec mailserver doveadm fts optimize -A
```
### Solr
The [dovecot-solr Plugin](https://wiki2.dovecot.org/Plugins/FTS/Solr) is used in conjunction with [Apache Solr](https://lucene.apache.org/solr/) running in a separate container. This is quite straightforward to setup using the following instructions.
## Setup Steps
Solr is a mature and fast indexing backend that runs on the JVM. The indexes are relatively compact compared to the size of your total email.
However, Solr also requires a fair bit of RAM. While Solr is [highly tuneable](https://solr.apache.org/guide/7_0/query-settings-in-solrconfig.html), it may require a bit of testing to get it right.
#### Setup
1. `docker-compose.yml`:
@ -47,10 +151,9 @@ The [dovecot-solr Plugin](https://wiki2.dovecot.org/Plugins/FTS/Solr) is used in
```
3. Recreate containers: `docker-compose down ; docker-compose up -d`
4. Flag all user mailbox FTS indexes as invalid, so they are rescanned on demand when they are next searched: `docker-compose exec mailserver doveadm fts rescan -A`
## Further Discussion
#### Further Discussion
See [#905][github-issue-905]
[github-issue-905]: https://github.com/docker-mailserver/docker-mailserver/issues/905
See [#905](https://github.com/docker-mailserver/docker-mailserver/issues/905)