Posfix – Making sense of delays in mail

The maillog

The maillog is easy enough to follow, but when you understand what all the delay and delays numbers mean then this may help really understand what is going on!
A standard email entry in postfix looks like:

Jan 10 10:00:00 testmtr postfix/smtp[20123]: 34A1B160852B: to=, relay=mx1.example.lan[1.2.3.4]:25, delay=0.49, delays=0.2/0/0.04/0.25, dsn=2.0.0, status=sent

Pretty straight forward: date, email identifier in the mailq (34A1B160852B), recipient, which server the email is being sent to (relay). It is the delay and delays I’d like to talk about.

Delay and Delays
If we take a look at the example email from above:

Jan 10 10:00:00 testmtr postfix/smtp[20123]: 34A1B160852B: to=, relay=mx1.example.lan[1.2.3.4]:25, delay=0.49, delays=0.2/0/0.04/0.25, dsn=2.0.0, status=sent

The delay parameter (delay=0.49) is fairly self explanatory, it is the total amount of time this email (34A1B160852B) has been on this server. But what is the delays parameter all about?

delays=0.2/0/0.04/0.25

NOTE: Numbers smaller than 0.01 seconds are truncated to 0, to reduce the noise level in the logfile.

You might have guessed it is a break down of the total delay, but what do each number represent?

Well from the release notes we get:

delays=a/b/c/d:
a=time before queue manager, including message transmission;
b=time in queue manager;
c=connection setup time including DNS, HELO and TLS;
d=message transmission time.

There for looking at our example:

  • a (0.2): The time before getting to the queue manager, so the time it took to be transmitted onto the mail server and into postfix.
  • b (0): The time in queue manager, so this email didn’t hit the queues, so it was emailed straight away.
  • c (0.04): The time it took to set up a connection with the destination mail relay.
  • d (0.25): The time it took to transmit the email to the destination mail relay.

However if the email is deferred, then when the email is attempted to be sent again:

Jan 10 10:00:00 testmtr postfix/smtp[20123]: 34A1B160852B: to=, relay=mx1.example.lan[1.2.3.4]:25, delay=82, delays=0.25/0/0.5/81, dsn=4.4.2, status=deferred (lost connection with mx1.example.lan[1.2.3.4] while sending end of data -- message may be sent more than once)

Jan 10 testmtr postfix/smtp[20123]: 34A1B160852B: to=, relay=mx1.example.lan[1.2.3.4]:25, delay=1092, delays=1091/0.2/0.8/0.25, dsn=2.0.0, status=sent

This time the first entry shows how long it took before the destination mail relay took to time out and close the connection:

delays=0.25/0/0.5/81
Therefore: 81 seconds.

The email was deferred then about 15 minutes later (1009 seconds [delays – <total delay from last attempt> ]) another attempt is made.
This time the delay is a lot larger, as the total time this email has spent on the server is a lot longer.

delay=1092, delays=1091/0.2/0.8/0.25

What is interesting though is the value of ‘a’ is now 1091, which means when an email is resent the ‘a’ value in the breakdown also includes the amount of time this email has currently spend on the system (before this attempt).

So there you go, those delays values are rather interesting and can really help solve where bottlenecks lie on your system. In the above case we obviously had some problem communicating to the destination mail relay, but worked the second time, so isn’t a problem with our system… or so I’d like to think.