Professional Documents
Culture Documents
Plaintext authentication
The simplest authentication mechanism is PLAIN. The client simply sends the password
unencrypted to Dovecot. All clients support the PLAIN mechanism, but obviously there's the
problem that anyone listening on the network can steal the password. For that reason (and some
others) other mechanisms were implemented.
Today however many people use SSL/TLS, and there's no problem with sending unencrypted
password inside SSL secured connections. So if you're using SSL, you probably don't need to
bother worrying about anything else than the PLAIN mechanism.
Another plaintext mechanism is LOGIN. It's typically used only by SMTP servers to let Outlook
clients perform SMTP authentication. Note that LOGIN mechanism is not the same as IMAP's
LOGIN command. The LOGIN command is internally handled using PLAIN mechanism.
Non-plaintext authentication
Non-plaintext mechanisms have been designed to be safe to use even without SSL/TLS
encryption. Because of how they have been designed, they require access to the plaintext
password or their own special hashed version of it. This means that it's impossible to use non-
plaintext mechanisms with commonly used DES or MD5 password hashes.
If you want to use more than one non-plaintext mechanism, the passwords must be stored as
plaintext so that Dovecot is able to generate the required special hashes for all the different
mechanisms. If you want to use only one non-plaintext mechanism, you can store the passwords
using the mechanism's own password scheme.
With success/failure password databases (e.g. PAM) it's not possible to use non-plaintext
mechanisms at all, because they only support verifying a known plaintext password.
Dovecot supports the following non-plaintext mechanisms:
• CRAM-MD5: Protects the password in transit against eavesdroppers. Somewhat good
support in clients.
• DIGEST-MD5: Somewhat stronger cryptographically than CRAM-MD5, but clients
rarely support it.
• APOP: This is a POP3-specific authentication. Similiar to CRAM-MD5, but requires
storing password in plaintext.
• NTLM: Mechanism created by Microsoft and supported by their clients.
○ Optionally supported using Samba's winbind.
• GSS-SPNEGO: Similar to NTLM.
• GSSAPI: Kerberos v5 support.
• RPA: Compuserve RPA authentication mechanism. Similar to DIGEST-MD5, but client
support is rare.
• ANONYMOUS: Support for logging in anonymously. This may be useful if you're
intending to provide publically accessible IMAP archive.
• OTP and SKEY: One time password mechanisms. Supported only by Dovecot v1.1 and
later.
• EXTERNAL: EXTERNAL SASL mechanism. Supported only by Dovecot v1.2 and
later.
Configuration
By default only PLAIN mechanism is enabled. You can change this by modifying
dovecot.conf:
auth default {
mechanisms = plain login cram-md5
# ..
}
SSL
SSL and TLS terms are often used in confusing ways:
• SSL (Secure Sockets Layer) is the original protocol implementation. SSLv3 is still
allowed by Dovecot, but it's rarely used. Some clients use SSL to mean that they're going
to connect to the imaps (993), pop3s (995) or smtps (465) port, although they're still
going to use TLSv1 protocol.
• TLS (Transport Layer Security) replaced the SSL protocol. TLSv1 protocol is used
practically always nowadays. Some clients use TLS to mean that they're going to use
STARTTLS command after connecting to the standard imap (143), pop3 (110) or smtp
port (25/587). Nothing would prevent using SSLv3 protocol after STARTTLS command.
Using two separate ports for plaintext and SSL connections was thought to be wasteful, so
STARTTLS intended to deprecate the SSL ports (imaps, pop3s, smtps, etc). This never really
happened, probably because of two reasons:
• Some admins don't even know about STARTTLS.
• Some admins want to require SSL/TLS, but don't realize that this is also possible with
STARTTLS (Dovecot has disable_plaintext_auth=yes and ssl=required settings).
• Some admins understand everything, but still prefer to allow only SSL ports. This could
be because it makes it easier to ensure that no information is leaked, because SSL/TLS
handshake happens immediately. Some clients unfortunately try to do plaintext
authentication without STARTTLS, even when IMAP server has told the client that it
won't work.
Unfortunately there doesn't seem to be any clear and simple way to refer to these different
meanings.
SSL term is much more widely understood than TLS, so Dovecot configuration and this
documentation only talks about SSL when in fact it means both SSL/TLS.
Self-signed SSL certificates
Self-signed SSL certificates are the easiest way to get your SSL server working. However unless
you take some action to prevent it, this is at the cost of security:
• The first time the client connects to the server, it sees the certificate and
asks the user whether to trust it. The user of course doesn't really bother
verifying the certificate's fingerprint, so a man-in-the-middle attack can easily
bypass all the SSL security, steal the user's password and so on.
• If the client was lucky enough not to get attacked the first time it connected,
the following connections will be secure as long as the client had
permanently saved the certificate. Some clients do this, while others have to
be manually configured to accept the certificate.
The only way to be fully secure is to import the SSL certificate to client's (or operating system's)
list of trusted CA certificates prior to first connection. See SSL/CertificateClientImporting how
to do it for different clients.
Self-signed certificate creation
Dovecot includes a script to build self-signed SSL certificates using OpenSSL. In the source
distribution this exists in doc/mkcert.sh. Binary installations usually create the certificate
automatically when installing Dovecot and don't include the script.
The SSL certificate's configuration is taken from doc/dovecot-openssl.cnf file. Modify the file
before running mkcert.sh. Especially important field is the CN (Common Name) field, which
should contain your server's host name. The clients will verify that the CN matches the
connected host name, otherwise they'll say the certificate is invalid. It's also possible to use
wildcards (eg. *.domain.com) in the host name. They should work with most clients.
By default the certificate is created to /etc/ssl/certs/dovecot.pem and the private key file is
created to /etc/ssl/private/dovecot.pem. Also by default the certificate will expire in 365
days. If you wish to change any of these, modify the mkcert.sh script.
Certificate Authorities
The correct way to use SSL is to have each SSL certificate signed by an Certificate Authority
(CA). The client has a list of trusted Certificate Authorities, so whenever it sees a new SSL
certificate signed by a trusted CA, it will automatically trust the new certificate without asking
the user any questions.
There are two ways to get a CA signed certificate: buy it, or create your own CA. The clients
have a built-in list of trusted CAs, so buying from one of those CAs will have the advantage of
the certificate working without any client configuration. If you create your own CA, you'll have
to install the CA certificate to all the clients (see SSL/CertificateClientImporting).
There are multiple different tools for managing your own CA. The simplest way is to use a CA
managing tool as gnoMint or TinyCA. However, if you need to tailor the properties of the CA,
you always can use OpenSSL, very much customizable, but however a bit cumbersome.
Dovecot is a
high performance, secure, and fully standards-compliant IMAP/POP3 server. It also boasts a
much simpler configuration setup than other IMAP servers and has a broad variety of
authentication mechanisms. It also supports SSL and TLS encryption.
Many distributions are now available with Dovecot included; it may not be the default
IMAP/POP3 server, but it is usually a simple install command away.
Once you have installed Dovecot, the configuration file will most likely be /etc/dovecot.conf.
Many of the defaults are likely sufficient and will require little changes unless you need specific
locations for the mail spool, whether to change default authentication options, and so forth.
By default, Dovecot will only act as an IMAP server, but it can act as a POP3 server as well. To
do this, edit /etc/dovecot.conf and look for the protocols section:
protocols = pop3
This would tell Dovecot to act as a pure POP3 server. If you want to provide the full gambit of
POP3 and IMAP, with both the regular and SSL variants, use:
protocols = pop3 pop3s imap imaps
To use SSL, you will need to appropriately set the ssl_cert_file and ssl_key_file settings, and set
ssl_disable to no. The simplest way to get these certificates is to use the mkcert.sh script that
Dovecot comes with. On Mandriva Linux, this file is stored in /usr/share/doc/dovecot/. There is
also a dovecot-openssl.cnf file that you will want to edit to set the SSL certificate options.
Depending on where you wish to store the certificate and key file, you may want to edit
mkcert.sh as well, or change the SSLDIR variable to override the location:
# cd /usr/share/doc/dovecot
# vim dovecot-openssl.cnf
# mkdir -p /etc/ssl/dovecot/{certs,private}
# SSLDIR=/etc/ssl/dovecot sh mkcert.sh
Generating a 1024 bit RSA private key
..++++++
..................++++++
writing new private key to '/etc/ssl/dovecot/private/dovecot.pem'
-----
subject= /C=CA/ST=Alberta/L=Edmonton/O=Foo Company/OU=IMAP
server/CN=example.com/emailAddress=admin@example.com
SHA1 Fingerprint=9A:23:B8:B4:0E:16:06:11:B2:FE:4E:49:C8:A8:C2:87:D8:79:1B:82
Next, edit /etc/dovecot.conf again and set the following:
ssl_disable = no
ssl_cert_file = /etc/ssl/dovecot/certs/dovecot.pem
ssl_key_file = /etc/ssl/dovecot/private/dovecot.pem
Now restart dovecot and it will authenticate against the system for users, using PAM. Dovecot
does support virtual users as well, which makes it quite versatile. More information on
configuring Dovecot and all the other features it provides is available on the Dovecot wiki.
Dovecot is a
high performance, secure, and fully standards-compliant IMAP/POP3 server. It also boasts a
much simpler configuration setup than other IMAP servers and has a broad variety of
authentication mechanisms. It also supports SSL and TLS encryption.
Many distributions are now available with Dovecot included; it may not be the default
IMAP/POP3 server, but it is usually a simple install command away.
Once you have installed Dovecot, the configuration file will most likely be /etc/dovecot.conf.
Many of the defaults are likely sufficient and will require little changes unless you need specific
locations for the mail spool, whether to change default authentication options, and so forth.
By default, Dovecot will only act as an IMAP server, but it can act as a POP3 server as well. To
do this, edit /etc/dovecot.conf and look for the protocols section:
protocols = pop3
This would tell Dovecot to act as a pure POP3 server. If you want to provide the full gambit of
POP3 and IMAP, with both the regular and SSL variants, use:
protocols = pop3 pop3s imap imaps
To use SSL, you will need to appropriately set the ssl_cert_file and ssl_key_file settings, and set
ssl_disable to no. The simplest way to get these certificates is to use the mkcert.sh script that
Dovecot comes with. On Mandriva Linux, this file is stored in /usr/share/doc/dovecot/. There is
also a dovecot-openssl.cnf file that you will want to edit to set the SSL certificate options.
Depending on where you wish to store the certificate and key file, you may want to edit
mkcert.sh as well, or change the SSLDIR variable to override the location:
# cd /usr/share/doc/dovecot
# vim dovecot-openssl.cnf
# mkdir -p /etc/ssl/dovecot/{certs,private}
# SSLDIR=/etc/ssl/dovecot sh mkcert.sh
Generating a 1024 bit RSA private key
..++++++
..................++++++
writing new private key to '/etc/ssl/dovecot/private/dovecot.pem'
-----
subject= /C=CA/ST=Alberta/L=Edmonton/O=Foo Company/OU=IMAP
server/CN=example.com/emailAddress=admin@example.com
SHA1 Fingerprint=9A:23:B8:B4:0E:16:06:11:B2:FE:4E:49:C8:A8:C2:87:D8:79:1B:82
Next, edit /etc/dovecot.conf again and set the following:
ssl_disable = no
ssl_cert_file = /etc/ssl/dovecot/certs/dovecot.pem
ssl_key_file = /etc/ssl/dovecot/private/dovecot.pem
Now restart dovecot and it will authenticate against the system for users, using PAM. Dovecot
does support virtual users as well, which makes it quite versatile. More information on
configuring Dovecot and all the other features it provides is available on the Dovecot wiki.
Dovecot LDA
The Dovecot LDA, called deliver, is a local delivery agent which takes mail from an MTA and
delivers it to a user's mailbox, while keeping Dovecot index files up to date.
This page describes the common settings required to make deliver work. You should read it first,
and then the MTA specific pages:
• LDA/Postfix
• LDA/Exim
• LDA/Sendmail
• LDA/Qmail
• LDA/ZMailer
Main features of Dovecot LDA
• Mailbox indexing during mail delivery, providing faster mailbox access later
• Quota enforcing by a plugin
• Sieve language support by a plugin
○ Mail filtering
○ Mail forwarding
○ Vacation auto-reply
Common configuration
The configuration is done in the protocol lda section in dovecot.conf. The important settings
are:
• postmaster_address is used as the From: header address in bounce mails
• hostname is used in generated Message-IDs and in Reporting-UA: header in bounce
mails
• sendmail_path is used to send mails. Note that the default is /usr/lib/sendmail,
which doesn't necessarily work the same as /usr/sbin/sendmail.
• auth_socket_path specifies the UNIX socket to dovecot-auth where deliver can lookup
userdb information when -d parameter is used. See below how to configure Dovecot to
create the socket.
Note that dovecot.conf file must be world readable to enable deliver process read it, while
running with user privileges.
Parameters
Parameters accepted by deliver:
• -d <username>: Destination username. If given, the user information is looked up from
dovecot-auth. Typically used with virtual users, but not necessarily with system users.
• -a <address>: Destination address (e.g. user+ext@domain). Default is the same as
username. (v1.1+ only)
• -f <address>: Envelope sender address.
• -c <path>: Alternative configuration file path.
• -m <mailbox>: Destination mailbox (default is INBOX). If the mailbox doesn't exist, it's
created (unless -n is used). If message couldn't be saved to the mailbox for any reason, it's
delivered to INBOX instead.
○ If Sieve plugin is used, this mailbox is used as the "keep" action's
mailbox. It's also used if there is no Sieve script or if the script fails for
some reason.
○ v1.1: Deliveries to namespace prefix will result in saving the mail to INBOX
instead. For example if you have "Mail/" namespace, this allows you to specify
deliver -n -m Mail/$mailbox where mail is stored to Mail/$mailbox or to
INBOX if $mailbox is empty.
○ The mailbox name is specified the same as it's visible in IMAP client. For
example if you've a Maildir with .box.sub/ directory and your namespace
configuration is prefix=INBOX/, separator=/, the correct way to deliver mail
there is to use -m INBOX/box/sub
• -n:If the destination mailbox doesn't exist, don't create it. This affects both -m parameter
and fileinto action in Sieve scripts. The fallback is to deliver mail to INBOX.
• -s: Subscribe to mailboxes that are automatically created (via -m parameter or fileinto
Sieve action). (v1.1.3+)
• -e:If mail gets rejected, write the rejection reason to stderr and exit with EX_NOPERM.
The default is to send a rejection mail ourself (v1.0.1+).
• -k: Don't clear all environment at startup (v1.1+).
• -p <path>: Path to the mail to be delivered instead of reading from stdin. If using
maildir the file is hard linked to the destination if possible. This allows a single mail to be
delivered to multiple users using hard links, but currently it also prevents deliver from
updating cache file so it shouldn't be used unless really necessary. (v1.1+)
Return values
deliver will exit with one of the following values:
• 0 (EX_OK): Delivery was successful
• 64 (EX_USAGE): Invalid parameter given.
• 67 (EX_NOUSER): The destination username was not found.
• 78 (EX_CONFIG): Failed to read configuration file, a missing configuration
setting or deliver binary is setuid-root and world-executable. (v1.2+ no longer
uses this.)
• 77 (EX_NOPERM): -e parameter was used and mail was rejected. Typically this happens
when user is over quota and quota_full_tempfail=no.
• 75 (EX_TEMPFAIL): A temporary failure. This is returned for almost all failures.
See the log file for details.
System users
You can use deliver with a few selected system users (ie. user is found from /etc/passwd /
NSS) by calling deliver in the user's ~/.forward file:
| "/usr/local/libexec/dovecot/deliver"
This should work with any MTA which supports per-user .forward files. For qmail's per-user
setup, see LDA/Qmail.
This method doesn't require the authentication socket explained below since it's executed as the
user itself.
Virtual users
With a lookup
Give the destination username to deliver with -d parameter, for example:
deliver -f $FROM_ENVELOPE -d $DEST_USERNAME
You'll need to set up a master authentication socket for deliver so it knows where to find
mailboxes for the users:
protocol lda {
..
# UNIX socket path to master authentication server to find users.
#auth_socket_path = /var/run/dovecot/auth-master
}
auth default {
..
socket listen {
# Note that we're setting a master socket. SMTP AUTH for Postfix and Exim
uses client sockets.
master {
# Typically under base_dir/, if not the directory must be created.
path = /var/run/dovecot/auth-master
Plugins
• Most of the Dovecot plugins work with deliver. http://wiki.dovecot.org/Plugins
• Virtual quota can be enforced using Quota plugin. http://wiki.dovecot.org/Quota
• Sieve language support can be added with Sieve plugin.
http://wiki.dovecot.org/LDA/Sieve
entation.
Top of Form
Search:
Bottom of Form
• Login
• MailboxFormat
• Maildir
• FrontPage
• RecentChanges
• FindPage
• HelpContents
• MailboxFormat/Maildir
• Edit (Text)
• Edit (GUI)
• Comments
• Info
• Attachments
•
Top of Form
Bottom of Form
Maildir
This format debuted with the qmail server in the mid-1990s. Each mailbox folder is a directory
and each message a file. This improves efficiency because individual emails can be modified,
deleted and added without affecting the mailbox or other emails, and makes it safer to use on
networked file systems such as NFS.
Contents
1. Maildir
1. Dovecot extensions
1. IMAP UID mapping
2. IMAP keywords
3. Maildir filename extensions
2. Maildir and filesystems
1. General comparisons of Maildir on different filesystems
2. Linux ext2 / ext3
3. ReiserFS
4. XFS
1. Various tips
3. Directory Structure
4. Issues with the specification
1. Locking
2. Mail delivery
5. Procmail Problems
6. References
Dovecot extensions
Since the standard maildir specification doesn't provide everything needed to fully support the
IMAP protocol, Dovecot had to create some of its own non-standard extensions. The extensions
still keep the maildir standards compliant, so MUAs not supporting the extensions can still safely
use it as a normal maildir.
IMAP UID mapping
IMAP requires each message to have a permanent unique ID number. Dovecot uses dovecot-
uidlist file to keep UID <-> filename mapping. The file is basically in the same format as
Courier IMAP's courierimapuiddb file, except for one difference (see below).
The file begins with a header:
1 1173189136 20221
Where 1 means the file format version number, 1173189136 is the IMAP UIDVALIDITY and
20221 is the UID that will be given to the next added message. The version number is always 1
currently. Dovecot used to have version number 2 also for a while, so if the number is ever
increased it needs to become version 3.
After the header comes the list of UID <-> filename mappings:
123 1035478339.27041_118.foo.org
20220 1035478339.27041_118.foo.org:2,S
Because with maildir the filename changes every time the message's flags change, the filename
listed in the file doesn't necessarily exist. With Courier IMAP the filenames contained only the
maildir file's basename (ie. everything before ":2," string). Dovecot instead writes the file's last
known full filename. Usually this allows opening the file without reading the directory's contents
to find the file's current file name.
The dovecot-uidlist file doesn't need to be locked for reading. When writing dovecot-uidlist.lock
file needs to be created. The dovecot-uidlist file must never be directly modified, it can only be
replaced with rename() call.
dovecot-uidlist is updated lazily to optimize for disk I/O. If a message is expunged, it may not be
removed from dovecot-uidlist until sometimes later. This means that if you create a new file
using the same file name as what already exists in dovecot-uidlist, Dovecot thinks you
"unexpunged" message by restoring a message from backup. This causes a warning to be logged
and the file to be renamed.
Note that messages must not be modified once they've been delivered. IMAP (and Dovecot)
requires that messages are immutable. If you wish to modify them in any way, create a new
message instead and expunge the old one.
IMAP keywords
All the non-standard message flags are called keywords in IMAP. Some clients use these
automatically for marking spam (eg. $Junk, $NonJunk, $Spam, $NonSpam keywords).
Thunderbird uses labels which map to keywords $Label1, $Label2, etc.
Dovecot stores keywords in the maildir filename's flags field using letters a..z. This means that
only 26 keywords are possible to store in the maildir. If more are used, they're still stored in
Dovecot's index files. The mapping from single letters to keyword names is stored in dovecot-
keywords file. The file is in format:
0 $Junk
1 $NonJunk
0 means letter 'a' in the maildir filename, 1 means 'b' and so on. The file doesn't need to be locked
for reading, but when writing dovecot-uidlist file must be locked. The file must not be directly
modified, it can only be replaced with rename() call.
Maildir filename extensions
The standard filename definition is: "<base filename>:2,<flags>". Dovecot has extended the
<flags> field to be "<flags>[,<non-standard fields>]". This means that if Dovecot sees a comma
in the <flags> field while updating flags in the filename, it doesn't touch anything after the
comma. However other maildir MUAs may mess them up, so it's still not such a good idea to do
that. Basic <flags> are described here. The <non-standard fields> isn't used by Dovecot for
anything currently.
Dovecot supports reading a few fields from the <base filename>:
• ,S=<size>: <size> contains the file size. Getting the size from the filename avoids doing
a stat(), which may improve the performance. This is especially useful with Maildir++
quota.
• ,W=<vsize>: <vsize> contains the file's RFC822.SIZE, ie. the file size with linefeeds
being CR+LF characters. If the message was stored with CR+LF linefeeds, <size> and
<vsize> are the same. Setting this may give a small speedup because now Dovecot
doesn't need to calculate the size itself.
A maildir filename with those fields would look something like:
1035478339.27041_118.foo.org,S=1000,W=1030:2,S
ReiserFS
ReiserFS was built to be fast with lots of small files, so it works well with maildir.
XFS
XFS performance seems to depend on a lot of factors, also on the system and the file system
parameters.
• There are early reports on the dovecot mailing list which suggest that XFS seems quite a
lot slower than ext3 or ReiserFS: http://www.dovecot.org/list/dovecot/2007-
January/018994.html
• But then again others recommend XFS for the use with Maildir and dovecot:
http://www.dovecot.org/list/dovecot/2006-May/013216.html
• This 2007 Linux.conf.au talk about "Choosing and Tuning Linux File Systems" (Slides as
PDF) also recommends XFS for Maildir (alternatively ext3 with small blocks and high
inodetofile ratio)
• Someone else wrote here in the wiki: XFS on TSL 3.0.5 works almost twice as
fast as our prior EXT3 installation of which is significant in size. ReiserFS is
also a good option.
• Comparisons which suggest XFS as being best choice:
○ http://www.thesmbexchange.com/eng/qmail_fs_benchmark.html
○ http://www.htiweb.inf.br/benchmark/fsbench.htm
Various tips
• Mounting XFS with logbufs=8 option might increase the speed.
• Create the XFS with options -b size=1024 -d su=16k,sw=3 -
l logdev=<some_other_device> (Source:
http://www.thesmbexchange.com/eng/qmail_fs_benchmark.html)
• Use mkfs.xfs -f -l size=32768b,version=2 and mount.xfs -
o noatime,logbufs=8,logbsize=131072 (Source:
http://www.htiweb.inf.br/benchmark/fsbench.htm)
Directory Structure
Dovecot uses Maildir++ directory layout for organizing mailbox directories. This means that all
the folders are directly inside ~/Maildir directory:
• ~/Maildir/new, ~/Maildir/cur and ~/Maildir/tmp directories contain the messages
for INBOX. The tmp directory is used during delivery, new messages arrive in new and
read shall be moved to cur by the clients.
• ~/Maildir/.folder/ is a mailbox folder
• ~/Maildir/.folder.subfolder/ is a subfolder of a folder (ie. "folder/subfolder")
Most importantly this means that if your maildir folders exist in eg. ~/Maildir/folder and
~/Maildir/folder/subfolder, Dovecot won't see them unless you rename them to Maildir++
layout. v1.1 supports them by adding :LAYOUT=fs to mail_location.
• subscriptions file contains IMAP's mailbox subscriptions. (Note difference with
Mbox.)
Issues with the specification
Locking
Although maildir was designed to be lockless, Dovecot locks the maildir while doing
modifications to it or while looking for new messages in it. This is required because otherwise
Dovecot might temporarily see mails incorrectly deleted, which would cause trouble. Basically
the problem is that if one process modifies the maildir (eg. a rename() to change a message's
flag), another process in the middle of listing files at the same time could skip a file. The
skipping happens because readdir() system call doesn't guarantee that all the files are returned if
the directory is modified between the calls to it. This problem exists with all the commonly used
filesystems.
Because Dovecot uses its own non-standard locking (dovecot-uidlist.lock dotlock file),
other MUAs accessing the maildir don't support it. This means that if another MUA is updating
messages' flags or expunging messages, Dovecot might temporarily lose some message. After
the next sync when it finds it again, an error message may be written to log and the message will
receive a new UID.
Delivering mails to new/ directory doesn't have any problems, so there's no need for LDAs to
support any type of locking.
Mail delivery
Qmail's how a message is delivered page suggests to deliver the mail like this:
1. Create a unique filename (only "time.pid.host" here, later Maildir spec has
been updated to allow more uniqueness identifiers)
2. Do stat(tmp/<filename>). If the stat() found a file, wait 2 seconds and go back to
step 1.
3. Create and write the message to the tmp/<filename>.
4. link() it into new/ directory. Although not mentioned here, the link() could
again fail if the mail existed in new/ dir. In that case you should probably go
back to step 1.
All this trouble is rather pointless. Only the first step is what really guarantees that the mails
won't get overwritten, the rest just sounds nice. Even though they might catch a problem once in
a while, they give no guaranteed protection and will just as easily pass duplicate filenames
through and overwrite existing mails.
Step 2 is pointless because there's a race condition between steps 2 and 3. PID/host combination
by itself should already guarantee that it never finds such a file. If it does, something's broken
and the stat() check won't help since another process might be doing the same thing at the same
time, and you end up writing to the same file in tmp/, causing the mail to get corrupted.
In step 4 the link() would fail if an identical file already existed in the maildir, right? Wrong. The
file may already have been moved to cur/ directory, and since it may contain any number of flags
by then you can't check with a simple stat() anymore if it exists or not.
Step 2 was pointed out to be useful if clock had moved backwards. However again this doesn't
give any actual safety guarantees, because an identical base filename could already exist in cur/.
Besides if the system was just rebooted, the file in tmp/ could probably be even overwritten
safely (assuming it wasn't already link()ed to new/).
So really, all that's important in not getting mails overwritten in your maildir is the step 1:
Always create filenames that are guaranteed to be unique. Forget about the 2 second waits and
such that the Qmail's man page talks about.
Procmail Problems
Maildir format is somewhat compatible with MH format. This is sometimes a problem when
people configure their procmail to deliver mails to Maildir/new. This makes procmail create the
messages in MH format, which basically means that the file is called msg.inode_number. While
this appears to work first, after expunging messages from the maildir the inodes are freed and
will be reused later. This means that another file with the same name may come to the maildir,
which makes Dovecot think that an expunged file reappeared into the mailbox and an error is
logged.
The proper way to configure procmail to deliver to a Maildir is to use Maildir/ as the
destination.
LDA Indexing
Dovecot v1.0's deliver updates the main index file while message is being saved. This is useful
with mbox format, especially if mbox_very_dirty_syncs=no. With Maildir the benefits of this
are pretty small.
Dovecot v1.1+ deliver updates also cache file, which can be very useful with all mailbox
formats. It means that when IMAP client wants to fetch the message's metadata (e.g. some
header fields) they're already found from the cache file and Dovecot doesn't have to open and
parse the message file. There are some tradeoffs though:
• LDA indexing wastes disk I/O because it has to open and update index files
• LDA indexing saves disk I/O because it already has the message body in memory, so it
doesn't need to read it from disk.
• IMAP indexing wastes disk I/O because it has to open and read message files
• IMAP indexing may save disk I/O because IMAP process always has index files opened,
and many IMAP clients are configured to download all new message bodies anyway, so
the second time message bodies are read they're already in memory
So it depends on IMAP client if it's faster to use LDA or IMAP time indexing. In any case the
user experience is typically faster with LDA indexing, because the message list metadata can be
returned faster when it's pre-indexed.
See IndexFiles for more information about what the index files contain.
Non-indexed mail delivery
Ignoring the benefits of cache file updates, the only thing left is the main index updates. As
mentioned above, with Maildir format these benefits are very small. This also means that it's
perfectly fine to use a non-Dovecot MDA to deliver mails that doesn't update indexes. Dovecot
can efficiently see and index such new mails without doing anything expensive like "rebuilding
indexes".
Cache file
Cache file is used for storing immutable data. It supports several different kinds of fields:
MAIL_CACHE_FIELD_FIXED_SIZE
The field size doesn't need to be stored in the cache file. It's always the same.
MAIL_CACHE_FIELD_BITMASK
A fixed size bitmask field. It's possible to add new bits by updating this field. All the
added fields are ORed together.
MAIL_CACHE_FIELD_VARIABLE_SIZE
Variable sized binary data.
MAIL_CACHE_FIELD_STRING
Variable sized string.
MAIL_CACHE_FIELD_HEADER
Variable sized message header. The data begins with a 0-terminated
uint32_t line_numbers[]. The line number exists only for each header, header
continuation lines in multiline headers don't get listed. After the line numbers comes the
list of headers, including the "header-name: " prefix for each line, LFs and the TABs or
spaces for continued lines.
The last 3 variable sized fields are treated identically by the cache file code. Their main purpose
is to make it easier for "dump cache file's contents" programs (src/util/idxview) to do their
job.
Locking
Because cache file is typically used in potentially long-running operations, such as with IMAP
command FETCH 1:* (BODY.PEEK[] ENVELOPE BODYSTRUCTURE) it's important that updating
the cache file doesn't block out any other readers. Also because the readers are often also writers
(if something isn't cached, it's added there), it's important that they don't block writers either.
Reading cache files requires no locking. Writing is done by first locking the file, reserving some
space to write to, and immediately after that unlocking the file. This way the transaction can keep
writing to the cache file as long as it wants to without blocking other writers. When the
transaction is committed, the updated cache offsets are written to the transaction log which
makes them visible to other processes.
This also means that it's possible for two processes to write the same cached fields twice to the
cache file. Because the data written to the cache file are really just cached data, the fields'
contents are identical. Having the data exist twice (or even more times) means wasting some disk
space, but otherwise it isn't a problem. The duplicates are dropped the next time the file is
compressed.
Cache decisions
Dovecot tries to be smart about what it keeps in the cache file. If the client never fetches the
cached data, it's just waste of disk space and disk I/O.
The caching decisions are:
MAIL_CACHE_DECISION_NO
This field isn't cached currently.
MAIL_CACHE_DECISION_TEMP
This field is cached for new mails.
MAIL_CACHE_DECISION_YES
This field is cached for all mails.
Normally Dovecot changes the decisions based on what fields are fetched and for what
messages. A specific decision can be forced by ORing it with MAIL_CACHE_DECISION_FORCED.
mail-cache-decisions.c file contains the rules how Dovecot changes the decisions. The
following is copied from the file:
Users can be divided to three groups:
1. Most users will use only a single IMAP client which caches everything locally. For these
users it's quite pointless to do any kind of caching as it only wastes disk space. That
might also mean more disk I/O.
2. Some users use multiple IMAP clients which cache everything locally. These could
benefit from caching until all clients have fetched the data. After that it's useless.
3. Some clients don't do permanent local caching at all. For example Pine and webmails.
These clients would benefit from caching everything. Some locally caching clients might
also access some data from server again, such as when searching messages. They could
benefit from caching only these fields.
After thinking about these a while, I figured out that people who care about performance most
will be using Dovecot optimized LDA anyway which updates the indexes/cache immediately. In
that case even the first user group would benefit from caching the same way as second group.
LDA reads the mail anyway, so it might as well extract some information about it and store them
into cache.
So, group 1. and 2. could be optimally implemented by keeping things cached only for a while. I
thought a week would be good. When cache file is compressed, everything older than week will
be dropped.
But how to figure out if user is in group 3? One quite easy rule would be to see if client is
accessing messages older than a week. But with only that rule we might have already dropped
useful cached data. It's not very nice if we have to read and cache it twice.
Most locally caching clients always fetch new messages (all but body) when they see them. They
fetch them in ascending order. Noncaching clients might fetch messages in pretty much any
order, as they usually don't fetch everything they can, only what's visible in screen. Some will
use server side sorting/threading which also makes messages to be fetched in random order.
Second rule would then be that if a session doesn't fetch messages in ascending order, the fetched
field type will be permanently cached.
So, we have three caching decisions:
1. Don't cache: Clients have never wanted the field
2. Cache temporarily: Clients want this only once
3. Cache permanently: Clients want this more than once
Different mailboxes have different decisions. Different fields have different decisions.
There are some problems, such as if a client accesses message older than a week, we can't know
if user just started using a new client which is just filling its local cache for the first time. Or it
might be a client user hasn't just used for over a week. In these cases we shouldn't have marked
the field to be permanently cached. User might also switch clients from non-caching to caching.
So we should re-evaluate our caching decisions from time to time. This is done by checking the
above rules constantly and marking when was the last time the decision was right. If decision
hasn't matched for two months, it's changed. I picked two months because people go to at least
one month vacations where they might still be reading mails, but with different clients.
Locking
The index files are designed so that readers cannot block a writer, and write locks are always
short enough not to cause other processes to wait too long. Dovecot v0.99's index files didn't do
this, and it was common to get lock timeouts when using multiple connections to the same large
mailbox.
The main index file is the only file which has read locks. They can however block the writer only
for two seconds (and even this could be changed to not block at all). The writes are locked only
for the duration of the mailbox synchronization.
Transaction logs don't require read locks. The writing is locked for the duration of the mailbox
synchronization, and also for single transaction appends.
Cache files doesn't require read locks. They're locked for writing only for the duration of
allocating space inside the file. The actual writing inside the allocated space is done without any
locks being held.
In future these could be improved even further. For example there's no need to keep any index
files locked while synchronizing, as long the mailbox backend takes care of the locking issues.
Also writing to transaction log could work in a similar way to cache files: Lock, allocate space,
unlock, write.
Lockless integers
Dovecot uses several different techniques to allow reading files without locking them. One of
them uses fields in a "lockless integer" format. Initially these fields have "unset" value. They can
be set to a wanted value in range 0..228 (with 32bit fields) once, but they cannot be changed. It
would be possible to set them back to "unset", but setting them the second time isn't safe
anymore, so Dovecot never does this.
The lockless integers work by allocating one bit from each byte of the value to "this value is set"
flag. The reader then verifies that the flag is set for the value's all bytes. If all of them aren't set,
the value is still "unset". Dovecot uses the highest bit for this flag. So for example:
• 0x00000000: The value is unset
• 0xFFFF7FFF: The value is unset, because one of the bytes didn't have the highest bit set
• 0xFFFFFFFF: The value is 228-1
• 0x80808080: The value is 0
• 0x80808180: The value is 0x80
Dovecot contains mail_index_uint32_to_offset() and mail_index_offset_to_uint32()
functions to translate values between integers and lockless integers. The "unset" value is returned
as 0, so it's not possible to differentiate between "unset" and "set" 0 values.
Main index
The main index can be used to quickly look up messages' UIDs, flags, keywords and extension-
specific data, such as cache file or mbox file offsets.
Reading, writing and locking
Reading dovecot.index file requires locking, unfortunately. Shared read locking is done using
the standard index locking method specified in lock_method setting (lock_method parameter
for mail_index_open()).
Writing to index files requires transaction log to be exclusively locked first. This way the index
locking only has to worry about existing read locks. The locking works by first trying to lock the
index with the standard locking method, but if it couldn't acquire the lock in two seconds, it'll
fallback to copying the index file to a temporary file, and when unlocking it'll rename() the
temporary file over the dovecot.index file. Note that this is safe only because of the exclusive
transaction log lock. This way the writers are never blocked by readers who are allowed to keep
the shared lock as long as they want.
The copy-locking is used always when doing anything that could corrupt the index file if it
crashed in the middle of an operation. For example if the header or record size changes, or if
messages are expunged. New messages can be appended however, because the message count in
the header is updated last. Expunging the last messages would probably be safe also (because
only the header needs updating), but it's not done currently.
The index file should never be directly modified. Everything should go through the transaction
log, and the only time the index needs to be write-locked is when transactions are written to it.
Currently the index file is updated whenever the backend mailbox is synchronized. This isn't
necessary, because an old index file can be updated using the transaction log. In future there
could be some smarter decisions about when writing to the index isn't worth the extra disk
writes.
Header
Fields that won't change without recreating the index:
major_version
If this doesn't match MAIL_INDEX_MAJOR_VERSION, don't try to read the index. Dovecot
recreates the index file then.
minor_version
If this doesn't match MAIL_INDEX_MINOR_VERSION there are some backwards compatible
changes in the index file (typically header fields). Try to preserve the headers and the
minor version when updating the index file.
base_header_size
Extension headers begin after the base headers. This is normally the same as
sizeof(struct mail_index_header).
header_size
Records begin after base and extension headers.
record_size
Size of each record and its extensions. Initially the same as
sizeof(struct mail_index_record).
compat_flags
Currently there is just one compatibility flag: MAIL_INDEX_COMPAT_LITTLE_ENDIAN.
Dovecot doesn't try to bother to read different endianess files, they're simply recreated.
indexid
Unique index file ID. This is used to make sure that the main index, transaction log and
cache file are all part of the same index.
Header flags:
MAIL_INDEX_HDR_FLAG_CORRUPTED
Set whenever the index file is found to be corrupted. If the reader notices this flag, it
shouldn't try to continue using the index.
MAIL_INDEX_HDR_FLAG_HAVE_DIRTY
This index has records with MAIL_INDEX_MAIL_FLAG_DIRTY flag set.
MAIL_INDEX_HDR_FLAG_FSCK
Call mail_index_fsck() as soon as possible. This flag isn't actually set anywhere
currently.
Message UIDs and counters:
uid_validity
IMAP UIDVALIDITY field. Initially can be 0, but after it's set we don't currently try to
even handle the case of UIDVALIDITY changing. It's done by marking the index file
corrupted and recreating it. That's a bit ugly, but typically the UIDVALIDITY never
changes.
next_uid
UID given to the next appended message. Only increases.
messages_count
Number of records in the index file.
recent_messages_count
Number of records with MAIL_RECENT flag set.
seen_messages_count
Number of records with MAIL_SEEN flag set.
deleted_messages_count
Number of records with MAIL_DELETED flag set.
first_recent_uid_lowwater
There are no UIDs lower than this with MAIL_RECENT flag set.
first_unseen_uid_lowwater
There are no UIDs lower than this without MAIL_SEEN flag set.
first_deleted_uid_lowwater
There are no UIDs lower than this with MAIL_DELETE flag set.
The lowwater fields are used to optimize searching messages with/without a specific flag.
Fields related to syncing:
log_file_seq
Log file the log_*_offset fields point to.
log_file_int_offset, log_file_ext_offset
All the internal/external transactions before this offset in the log file are synced to the
index. External transactions are synced more often than internal, so
log_file_int_offset <= log_file_ext_offset.
sync_size, sync_stamp
Used by the mailbox backends to store their synchronization information. Some day these
should be removed and replaced with extension headers.
Then there are day fields:
day_stamp
UNIX timestamp to the beginning of the day when new records were last added to the
index file.
day_first_uid[8]
These fields are updated when day_stamp < today. The [0..6] are first moved to [1..7],
then [0] is set to the first appended UID. So they contain the first UID of the day for last
8 days when messages were appended.
The day_first_uid[] fields are used by cache file compression to decide when to drop
MAIL_CACHE_DECISION_TEMP data.
Extension headers
After the base header comes a list of extensions and their headers. The first extension begins
from mail_index_header.base_header_size offset. The second begins after the first one's
data[] and so on. The extensions always begin 64bit aligned however, so you may need to skip
a few bytes always. Read the extensions as long as the offset is smaller than
mail_index_header.header_size.
struct mail_index_ext_header {
uint32_t hdr_size; /* size of data[] */
uint32_t reset_id;
uint16_t record_offset;
uint16_t record_size;
uint16_t record_align;
uint16_t name_size;
/* unsigned char name[name_size] */
/* unsigned char data[hdr_size] (starting 64bit aligned) */
};
reset_id, record offset, size and alignment is explained in Design/Indexes/TransactionLog's
struct mail_transaction_ext_intro.
Records
There are hdr.messages_count records in the file. Each record contains at least two fields:
Record UID and flags. The UID is always increasing for the records, so it's possible to find a
record by its UID with binary search. The record size is specified by
mail_index_header.record_size.
The flags are a combination of enum mail_flags and enum mail_index_mail_flags. There
exists only one index flag currently: MAIL_INDEX_MAIL_FLAG_DIRTY. If a record has this flag
set, it means that the mailbox syncing code should ignore the flag in the mailbox and use the flag
in the index file instead. This is used for example with mbox and mbox_lazy_writes=yes. It
also allows having modifiable flags for read-only mailboxes.
The rest data is stored in record extensions.
Keywords
The keywords are stored in record extensions, but for better performance and lower disk space
usage in transaction logs, they are quite tightly integrated to the index file code.
The list of keywords is stored in "keywords" extension header:
struct mail_index_keyword_header {
uint32_t keywords_count;
/* struct mail_index_keyword_header_rec[] */
/* char name[][] */
};
struct mail_index_keyword_header_rec {
uint32_t unused; /* for backwards compatibility */
uint32_t name_offset; /* relative to beginning of name[] */
};
The unused field originally contained count field, but while writing this documentation I noticed
it's not actually used anywhere. Apparently it was added there accidentally. It'll be removed in
later versions.
So there exists keywords_count keywords, each listed in a NUL-terminated string beginning
from name_offset.
Since crashing in the middle of updating the keywords list pretty much breaks the keywords,
adding new keywords causes the index file to be always copied to a temporary file and be
replaced.
The keywords in the records are stored in a "keywords" extension bitfield. So the nth bit in the
bitfield points to the nth keyword listed in the header.
It's not currently possible to safely remove existing keywords.
Extensions
The extensions only specify their wanted size and alignmentation, the index file syncing code is
free to assign any offset inside the record to them. The extensions may be reordered at any time.
Dovecot's current extension ordering code works pretty well, but it's not perfect. If the extension
size isn't the same as its alignmentation, it may create larger records than necessary. This will be
fixed later.
The records size is always divisible by the maximum alignmentation requirement. This isn't
strictly necessary either, so it could be fixed later as well.
Transaction log
The transaction log is a bit similar to transaction logs in databases. All the updates to the main
index files are first written to the transaction log, and only after that the main index file is
updated. There are several advantages to this:
• It provides atomic transactions: The transaction either succeeds, or it doesn't. For
example if a transaction sets a flag to one message and removes it from another, it's
guaranteed that both changes happen.
○ When updating the changes to the main index file, the last thing that's done is to
update the "transaction log position" in the header. So if Dovecot crashes after
having updated only the first flag, the next time the mailbox is opened both of the
changes are done all over again.
• It allows another process to quickly see what changes have been made. For example
IMAP needs to get a list of external changes after each command.
○ This is also important when storing the index files in NFS or in a clustered
filesystem. Instead of re-reading the whole index file after each external change,
Dovecot can simply read the new changes from the transaction log and apply
them to the in-memory copy of the main index. In-memory caching of
dovecot.index.cache file also relies on the transaction log telling what parts of
the file has changed.
• In future the transaction logs can be somewhat easily used to implement replication.
Internal vs. external
Transactions are either internal or external. The difference is that external transactions describe
changes that were already made to the mailbox, while internal transactions are commands to do
something to the mailbox. When beginning to synchronize a mailbox with index files, the index
file is first updated with all the external changes, and the uncommitted internal transactions are
applied on top of them.
When synchronizing the mailbox, using the synchronization transaction writes only external
transactions. Also if the index file is updated when saving new mails to the mailbox, the append
transactions must be external. This is because the changes are already in the mailbox at the time
the transaction is read.
Reading and writing
Reading transaction logs doesn't require any locking at all. Writing is exclusively locked using
the index files' default lock method (as specified by the lock_method setting).
A new log is created by first creating a dovecot.index.log.newlock dotlock file. Once you
have the dotlock, check again that the dovecot.index.log wasn't created (or recreated) by
another process. If not, go ahead and write the log header to the dotlock file and finally
rename() it to dovecot.index.log.
Currently there doesn't exist actual transaction boundaries in the log file. All the changes in a
transaction are simply written as separate records to the file. Each record begins with a
struct mail_transaction_header, which contains the record's size and type. The size is in
lockless integer format.
The first transaction record is written with the size field being 0. Once the whole transaction has
been written, the 0 is updated with the actual size. This way the transaction log readers won't see
partial transactions because they stop at the size=0 if the transaction isn't fully written yet.
Note that because there are no transaction boundaries, there's a small race condition here with
mmap()ed log files:
1. Process A: write() half of the transaction
2. Process B: mmap() the file.
3. Process A: write() the rest of the transaction, updating the size=0 also
4. Process B: parse the log file. it'll go past the original size=0 because the size had changed
in the mmap, but it stops in the middle of the transaction because the mmap size doesn't
contain the whole transaction
This probably isn't a big problem, because I've never seen this happen even with stress tests.
Should be fixed at some point anyway.
Header
The transaction log's header never changes, except the indexid field may be overwritten with 0 if
the log is found to be corrupted. The fields are:
major_version
If this doesn't match MAIL_TRANSACTION_LOG_MAJOR_VERSION, don't try to parse it. If
Dovecot sees this, it'll recreate the log file.
minor_version
If this doesn't match MAIL_TRANSACTION_LOG_MINOR_VERSION, the log file contains
some backwards compatible changes. Currently you can just ignore this field.
hdr_size
Size of the log file's header. Use this instead of
sizeof(struct mail_transaction_log_header), so that it's possible to add new
fields and still be backwards compatible.
indexid
This field must match to main index file's indexid field.
file_seq
The file's creation sequence. Must be increasing.
prev_file_seq, prev_file_offset
Contains the sequence and offset of where the last transaction log ended. When
transaction log is rotated and the reader's "sync position" still points to the previous log
file, these fields allow it to easily check if there had been any more changes in the
previous file.
create_stamp
UNIX timestamp when the file was created. Used in determining when to rotate the log
file.
Record header
The transaction record header (struct mail_transaction_header) contains size and type
fields. The size field is in lockless integer format. A single transaction record may contain
multiple changes of the same type, although some types don't allow this. Because the size of the
transaction record for each type is known (or can be determined from the type-specific record
contents), the size field can be used to figure out how many changes need to be done. So for
example a record can contain:
• struct mail_transaction_header { type = MAIL_TRANSACTION_APPEND, size =
sizeof(struct mail_index_record) * 2 }
• struct mail_index_record { uid = 1, flags = 0 }
• struct mail_index_record { uid = 2, flags = 0 }
UIDs
Many record types contain uint32_t uid1, uid2 fields. This means that the changes apply to
all the messages in uid1..uid2 range. The messages don't really have to exist in the range, so for
example if the first messages in the mailbox had UIDs 1, 100 and 1000, it would be possible to
use uid1=1, uid2=1000 to describe changes made to these 3 messages. This also means that it's
safe to write transactions describing changes to messages that were just expunged by another
process (and already written to the log file before our changes).
Appends
As described above, the appends must be in external transactions. The append transaction's
contents is simply the struct mail_index_record, so it contains only the message's UID and
flags. The message contents aren't written to transaction log. Also if the message had any
keywords when it was appended, they're in a separate transaction record.
Expunges
Because expunges actually destroy messages, they deserve some extra protection to make it less
likely to accidentally expunge wrong messages in case of for example file corruption. The
expunge transactions must have MAIL_TRANSACTION_EXPUNGE_PROT ORed to the transaction
type field. If an expunge type is found without it, assume a corrupted transaction log.
Flag changes
The flag changes are described in:
struct mail_transaction_flag_update {
uint32_t uid1, uid2;
uint8_t add_flags;
uint8_t remove_flags;
uint16_t padding;
};
The padding is ignored completely. A single flag update structure can add new flags or remove
existing flags. Replacing all the files works by setting remove_flags = 0xFF and the
add_flags containing the new flags.
Keyword changes
Specific keywords can be added or removed one keyword at a time:
struct mail_transaction_keyword_update {
uint8_t modify_type; /* enum modify_type : MODIFY_ADD / MODIFY_REMOVE
*/
uint8_t padding;
uint16_t name_size;
/* unsigned char name[];
array of { uint32_t uid1, uid2; }
*/
};
There is padding after name[] so that uid1 begins from a 32bit aligned offset.
If you want to replace all the keywords (eg. IMAP's STORE 1:* FLAGS (keyword) command),
you'll first have to remove all of them with MAIL_TRANSACTION_KEYWORD_RESET and then add
the new keywords.
Extensions
Extension records allow creating and updating extension-specific header and message record
data. For example messages' offsets to cache file or mbox file are stored in extensions.
Whenever using an extension, you'll need to first write MAIL_TRANSACTION_EXT_INTRO record.
This is a bit kludgy and hopefully will be replaced by something better in future. The intro
contains:
struct mail_transaction_ext_intro {
/* old extension: set ext_id. don't set name.
new extension: ext_id = (uint32_t)-1. give name. */
uint32_t ext_id;
uint32_t reset_id;
uint32_t hdr_size;
uint16_t record_size;
uint16_t record_align;
uint16_t unused_padding;
uint16_t name_size;
/* unsigned char name[]; */
};
If the extension already exists in the index file (it can't be removed), you can use the ext_id
field directly. Otherwise you'll need to give a name to the extension. It's always possible to just
give the name if you don't know the existing extension ID, but this uses more space of course.
reset_id contains kind of a "transaction validity" field. It's updated with
MAIL_TRANSACTION_EXT_RESET record, which also causes the extension records' contents to be
zeroed. If an introduction's reset_id doesn't match the last EXT_RESET, it means that the
extension changes are stale and they must be ignored. For example:
• dovecot.index.cache file's file_seq header is used as a reset_id. Initially it's 1.
• Process A: Begins a cache transaction, updating some fields in it.
• Process B: Decides to compress the cache file, and issues a reset_id = 2 change.
• Process A: Commits the transaction with reset_id = 1, but the cache file offsets point
to the old file, so the changes must be ignored.
hdr_size specifies the number of bytes the extension wants to have in the index file's header.
record_size specifies the number of bytes it wants to use for each record. The sizes may grow
or shrink any time. record_align contains the required alignmentation for the field. For
example if the extension contains a 32bit integer, you want it to be 32bit aligned so that the
process won't crash in CPUs which require proper alignmentation. Then again if you want to
access the field as 4 bytes, the alignmentation can be 1.
Extension record updates typically are message-specific, so the changes must be done for each
message separately:
struct mail_transaction_ext_rec_update {
uint32_t uid;
/* unsigned char data[]; */
};
Cache file
Cache file is used for storing immutable data. It supports several different kinds of fields:
MAIL_CACHE_FIELD_FIXED_SIZE
The field size doesn't need to be stored in the cache file. It's always the same.
MAIL_CACHE_FIELD_BITMASK
A fixed size bitmask field. It's possible to add new bits by updating this field. All the
added fields are ORed together.
MAIL_CACHE_FIELD_VARIABLE_SIZE
Variable sized binary data.
MAIL_CACHE_FIELD_STRING
Variable sized string.
MAIL_CACHE_FIELD_HEADER
Variable sized message header. The data begins with a 0-terminated
uint32_t line_numbers[]. The line number exists only for each header, header
continuation lines in multiline headers don't get listed. After the line numbers comes the
list of headers, including the "header-name: " prefix for each line, LFs and the TABs or
spaces for continued lines.
The last 3 variable sized fields are treated identically by the cache file code. Their main purpose
is to make it easier for "dump cache file's contents" programs (src/util/idxview) to do their
job.
Locking
Because cache file is typically used in potentially long-running operations, such as with IMAP
command FETCH 1:* (BODY.PEEK[] ENVELOPE BODYSTRUCTURE) it's important that updating
the cache file doesn't block out any other readers. Also because the readers are often also writers
(if something isn't cached, it's added there), it's important that they don't block writers either.
Reading cache files requires no locking. Writing is done by first locking the file, reserving some
space to write to, and immediately after that unlocking the file. This way the transaction can keep
writing to the cache file as long as it wants to without blocking other writers. When the
transaction is committed, the updated cache offsets are written to the transaction log which
makes them visible to other processes.
This also means that it's possible for two processes to write the same cached fields twice to the
cache file. Because the data written to the cache file are really just cached data, the fields'
contents are identical. Having the data exist twice (or even more times) means wasting some disk
space, but otherwise it isn't a problem. The duplicates are dropped the next time the file is
compressed.
Cache decisions
Dovecot tries to be smart about what it keeps in the cache file. If the client never fetches the
cached data, it's just waste of disk space and disk I/O.
The caching decisions are:
MAIL_CACHE_DECISION_NO
This field isn't cached currently.
MAIL_CACHE_DECISION_TEMP
This field is cached for new mails.
MAIL_CACHE_DECISION_YES
This field is cached for all mails.
Normally Dovecot changes the decisions based on what fields are fetched and for what
messages. A specific decision can be forced by ORing it with MAIL_CACHE_DECISION_FORCED.
mail-cache-decisions.c file contains the rules how Dovecot changes the decisions. The
following is copied from the file:
Users can be divided to three groups:
1. Most users will use only a single IMAP client which caches everything locally. For these
users it's quite pointless to do any kind of caching as it only wastes disk space. That
might also mean more disk I/O.
2. Some users use multiple IMAP clients which cache everything locally. These could
benefit from caching until all clients have fetched the data. After that it's useless.
3. Some clients don't do permanent local caching at all. For example Pine and webmails.
These clients would benefit from caching everything. Some locally caching clients might
also access some data from server again, such as when searching messages. They could
benefit from caching only these fields.
After thinking about these a while, I figured out that people who care about performance most
will be using Dovecot optimized LDA anyway which updates the indexes/cache immediately. In
that case even the first user group would benefit from caching the same way as second group.
LDA reads the mail anyway, so it might as well extract some information about it and store them
into cache.
So, group 1. and 2. could be optimally implemented by keeping things cached only for a while. I
thought a week would be good. When cache file is compressed, everything older than week will
be dropped.
But how to figure out if user is in group 3? One quite easy rule would be to see if client is
accessing messages older than a week. But with only that rule we might have already dropped
useful cached data. It's not very nice if we have to read and cache it twice.
Most locally caching clients always fetch new messages (all but body) when they see them. They
fetch them in ascending order. Noncaching clients might fetch messages in pretty much any
order, as they usually don't fetch everything they can, only what's visible in screen. Some will
use server side sorting/threading which also makes messages to be fetched in random order.
Second rule would then be that if a session doesn't fetch messages in ascending order, the fetched
field type will be permanently cached.
So, we have three caching decisions:
1. Don't cache: Clients have never wanted the field
2. Cache temporarily: Clients want this only once
3. Cache permanently: Clients want this more than once
Different mailboxes have different decisions. Different fields have different decisions.
There are some problems, such as if a client accesses message older than a week, we can't know
if user just started using a new client which is just filling its local cache for the first time. Or it
might be a client user hasn't just used for over a week. In these cases we shouldn't have marked
the field to be permanently cached. User might also switch clients from non-caching to caching.
So we should re-evaluate our caching decisions from time to time. This is done by checking the
above rules constantly and marking when was the last time the decision was right. If decision
hasn't matched for two months, it's changed. I picked two months because people go to at least
one month vacations where they might still be reading mails, but with different clients.
Quota
Quota backend specifies the method how Dovecot keeps track of the current quota usage. They
don't (usually) specify users' quota limits, that's done by returning extra fields from userdb. There
are different quota backends that Dovecot can use:
• fs: Filesystem quota.
• dirsize: The simplest and slowest quota backend, but it works quite well with mboxes.
• dict: Store quota usage in a dictionary (e.g. SQL).
• maildir: Store quota usage in Maildir++ maildirsize files. This is the most commonly
used quota for virtual users.
Enabling quota plugins
There are currently two quota related plugins:
• quota: Implements the actual quota handling and includes also all the quota backends.
• imap_quota: For reporting quota information via IMAP.
Usually you'd enable these by adding them to the mail_plugins settings in the config file:
protocol imap {
mail_plugins = quota imap_quota
}
protocol pop3 {
mail_plugins = quota
}
# In case you're using deliver:
protocol lda {
mail_plugins = quota
}
Configuration
The configuration is done differently for v1.0 and v1.1:
• v1.0 quota configuration
• v1.1 quota configuration
Quota and Trash mailbox
Standard way to expunge messages with IMAP works by:
1. Marking message with \Deleted flag
2. Actually expunging the message using EXPUNGE command
Both of these commands can be successfully used while user's quota is full. However many
clients use a "move-to-Trash" feature, which works by:
1. COPY the message to Trash mailbox
2. Mark the message with \Deleted
3. Expunge the message from the original mailbox.
4. (Maybe later expunge the message from Trash when "clean trash" feature is used)
If user is over quota (or just under it), the first COPY command will fail and user may get an
unintuitive message about not being able to delete messages because user is over quota. The
possible solutions for this are:
• Disable move-to-trash feature from client
• Dovecot v1.0 + Maildir++ quota: You can completely ignore Trash mailbox from quota
calculation by appending :ignore=Trash to the quota line. Note that this would allow
users to store messages infinitely to the mailbox.
• Dovecot v1.1 or v1.0 quota rewrite: You can ignore Trash like with v1.0, but you can
also give a separate quota rule giving Trash mailbox somewhat more quota (but not
unlimited).
To make sure users don't start keeping messages permanently in Trash you can use a nightly
cronjob or expire plugin (v1.1) to expunge old messages from Trash mailbox.