Postfix 3.2.0 - Sending to all MX records

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Postfix 3.2.0 - Sending to all MX records

Tom Hudson
Firstly, apologies if I haven't included all of the relevant information in this initial post. Please let me know if I have missed anything.

I am currently running Postfix 3.2.0 and have a problem relating to MX records and defered messages. What I have identified is, if a domain our server is trying to send to has an MX record which returns no response, the message is defered. Every time postfix attempts to redeliver this message, it uses the same lowest priority MX record.

I have found examples in our mail queue which are deferred with the reason "unknown mail transport error". When I attempt to telnet to the MX records for their domain, their lowest value MX is not contactable but the others are.

We can see no traffic attempting to go from postfix to the other MX records, only the lowest value every time.

I understand that it should be standard practice for Postfix to first attempt the lowest value and then attempt to use all other MX records before deferring the message.

Can anyone advise why this isn't working for me? I'll include any postconf settings which I think are relevant below and please tell me if you need any more information;

ignore_mx_lookup_error = no
smtp_defer_if_no_mx_address_found = no
smtp_mx_address_limit = 20
smtp_mx_session_limit = 5
smtp_skip_5xx_greeting = yes
smtp_skip_quit_response = yes
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Postfix 3.2.0 - Sending to all MX records

Viktor Dukhovni

> On Jul 12, 2017, at 10:08 AM, Tom Hudson <[hidden email]> wrote:
>
> Firstly, apologies if I haven't included all of the relevant information in
> this initial post. Please let me know if I have missed anything.
>
> I am currently running Postfix 3.2.0 and have a problem relating to MX
> records and defered messages. What I have identified is, if a domain our
> server is trying to send to has an MX record which returns no response, the
> message is defered. Every time postfix attempts to redeliver this message,
> it uses the same lowest priority MX record.

That is not how Postfix generally works.  No further information is possible
without the relevant logs, and any relevant transport table and master.cf
settings.

> I have found examples in our mail queue which are deferred with the reason
> "unknown mail transport error". When I attempt to telnet to the MX records
> for their domain, their lowest value MX is not contactable but the others
> are.

"unknown mail transport error" is a serious problem, which indicates a
delivery agent crash or similar failure to report delivery status to the
queue manager.  You will find the cause in your logs:

http://www.postfix.org/DEBUG_README.html#logging

--
        Viktor.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Postfix 3.2.0 - Sending to all MX records

/dev/rob0
In reply to this post by Tom Hudson
On Wed, Jul 12, 2017 at 07:08:34AM -0700, Tom Hudson wrote:
> Firstly, apologies if I haven't included all of the relevant
> information in this initial post. Please let me know if I have
> missed anything.

Full "postconf -nf ; postconf -Mf" and complete non-verbose logging
of a single email which demonstrates your issue.

> I am currently running Postfix 3.2.0 and have a problem relating to
> MX records and defered messages. What I have identified is, if a
> domain our server is trying to send to has an MX record which
> returns no response, the message is defered. Every time postfix
> attempts to redeliver this message, it uses the same lowest
> priority MX record.
>
> I have found examples in our mail queue which are deferred with the
> reason "unknown mail transport error". When I attempt to telnet to

You have changed a transport(5)-related setting to something which
isn't a valid transport in your master.cf.

> the MX records for their domain, their lowest value MX is not
> contactable but the others are.
>
> We can see no traffic attempting to go from postfix to the other MX
> records, only the lowest value every time.
>
> I understand that it should be standard practice for Postfix to
> first attempt the lowest value and then attempt to use all other MX
> records before deferring the message.
>
> Can anyone advise why this isn't working for me? I'll include any
> postconf settings which I think are relevant below and please tell
> me if you need any more information;
>
> ignore_mx_lookup_error = no
> smtp_defer_if_no_mx_address_found = no
> smtp_mx_address_limit = 20
> smtp_mx_session_limit = 5
> smtp_skip_5xx_greeting = yes
> smtp_skip_quit_response = yes

See above.  This is not useful.
--
  http://rob0.nodns4.us/
  Offlist GMX mail is seen only if "/dev/rob0" is in the Subject:
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Postfix 3.2.0 - Sending to all MX records

Tom Hudson
This post was updated on .
In reply to this post by Viktor Dukhovni
Hi Viktor,

Thanks for getting back to me.

I've attached our master.cf file to this message. The only edits I have made is to replace references to my company's IP addresses and domain names.

tomhudson_masterconfig.txt

Below is an example error message I am seeing in our postfix mail logs. Before this message was sent, we enabled TCP dumps for the domain's two MX records which I will also attach. As you should be able to see, one file has data in (this is the lowest value MX record) and one file is empty (the 2nd MX record which is never attempted.

dump.pcap
dump2.pcap


Jul 12 17:18:21 mes-l2-rly-01 postfix/error[21315]: 2412B80170667: to=<Caitlin.Campbell@REDACTED.org.uk>, relay=none, delay=92009, delays=92009/0/0/0, dsn=4.3.0, status=deferred (unknown mail transport error)
 
Let me know if you need anything else.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Postfix 3.2.0 - Sending to all MX records

Viktor Dukhovni
On Wed, Jul 12, 2017 at 10:55:29AM -0400, Viktor Dukhovni wrote:

> That is not how Postfix generally works.  No further information is possible
> without the relevant logs, and any relevant transport table and master.cf
> settings.
>
> [...]
>
> "unknown mail transport error" is a serious problem, which indicates a
> delivery agent crash or similar failure to report delivery status to the
> queue manager.  You will find the cause in your logs:
>
> http://www.postfix.org/DEBUG_README.html#logging

On Thu, Jul 13, 2017 at 03:01:43AM -0700, Tom Hudson wrote:

> I've attached our master.cf file to this message.
>
> [...]
>
> Below is an example error message I am seeing in our postfix mail logs.

No, that's not the logging of the actual problem, that is logging
from the "postfix/smtp" (not "postfix/smtpd") process that crashed
while attempting delivery, and "postfix/master" reporting an abnormal
exit from an "smtp" process.

Without the relevant logs, no further help is possible.

> Jul 12 17:18:21 mes-l2-rly-01 postfix/error[21315]: 2412B80170667:
> to=<REDACTED>, relay=none, delay=92009, delays=92009/0/0/0, dsn=4.3.0,
> status=deferred (unknown mail transport error)

This is just a symptom of the fact that mail to the destination is
subsequently deferred.

> Let me know if you need anything else.

#         ###     ####   #####
#        #   #   #    # #     #
#       #     # #       #
#       #     # #       #
#       #     # #        #####
#       #     # #  ####       #
#       #     # #     #       #
#        #   #   #    # #     #
#######   ###     ####   #####

--
        Viktor.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Postfix 3.2.0 - Sending to all MX records

Daniel Caulfield
Hi Viktor,

I am working on this issue with Tom, apologies if we seem to be struggling with something simple here but the only log file we have is in /var/log/maillog. Even if we change the log settings to more verbose we don't seem to generate any other logs.

How would we go about getting you a copy of the 'postfix/smtpd' logs? where would they be or how do we switch that on?

This system was originally setup by a colleague of ours who is no longer working with us, our knowledge is a little more basic than his was.

Thanks
Danny
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Postfix 3.2.0 - Sending to all MX records

Dominic Raferd
On 13 July 2017 at 13:56, Daniel Caulfield <[hidden email]> wrote:
Hi Viktor,

I am working on this issue with Tom, apologies if we seem to be struggling
with something simple here but the only log file we have is in
/var/log/maillog. Even if we change the log settings to more verbose we
don't seem to generate any other logs.

How would we go about getting you a copy of the 'postfix/smtpd' logs? where
would they be or how do we switch that on?

This system was originally setup by a colleague of ours who is no longer
working with us, our knowledge is a little more basic than his was.

They will be in the same place, you can extract them thus:

grep "/smtpd" /var/log/maillog​
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Postfix 3.2.0 - Sending to all MX records

Daniel Caulfield
Hi,

Please find the log attached, is this what you are requesting? (we have had to split to multiple files but there are in numarical order)
smtpd1.logsmtpd2.logsmtpd3.logsmtpd4.log

Thanks
Danny
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Postfix 3.2.0 - Sending to all MX records

Dominic Raferd


On 13 July 2017 at 14:22, Daniel Caulfield <[hidden email]> wrote:
Hi,

Please find the log attached, is this what you are requesting? (we have had
to split to multiple files but there are in numarical order)
smtpd1.log <http://postfix.1071664.n5.nabble.com/file/n91271/smtpd1.log>
smtpd2.log <http://postfix.1071664.n5.nabble.com/file/n91271/smtpd2.log>
smtpd3.log <http://postfix.1071664.n5.nabble.com/file/n91271/smtpd3.log>
smtpd4.log <http://postfix.1071664.n5.nabble.com/file/n91271/smtpd4.log>

​I can't see anything useful here, only that a lot of mails are being sent to this server but only from 2 local machines. Maybe Viktor can glean something more from it.
It might be more informative if you extract log entries (from any component) which reference the queue id for one of these mails - for instance 581DA80219A19 (the last one) thus:

grep 581DA80219A19 /var/log/maillog
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Postfix 3.2.0 - Sending to all MX records

Viktor Dukhovni
In reply to this post by Daniel Caulfield
On Thu, Jul 13, 2017 at 12:38:51PM +0000, Viktor Dukhovni wrote:

> No, that's not the logging of the actual problem, that is logging
> from the "postfix/smtp" (not "postfix/smtpd") process that crashed
> while attempting delivery, and "postfix/master" reporting an abnormal
> exit from an "smtp" process.

Perhaps the above was not clear.  Post, in the body of the email
message (don't upload attachments) relevant NON-VERBOSE logs showing
errors from:

    postix/smtp (not postfix/smtpd)
    postfix/master

Outbound mail is sent by postfix/smtp, your smtp(8) delivery
agent is terminating abnormally.  This is logged, post *those*
logs.

--
        Viktor.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Postfix 3.2.0 - Sending to all MX records

Daniel Caulfield
Hi Viktor,

Apologies but I tried to post the log in the body but the forum is responding [Spam Detector] Message contains common spam words. and won't let me post it so the contents are in teh attached txt file.

Thanks
DannyMailLog.txt
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Postfix 3.2.0 - Sending to all MX records

/dev/rob0
In reply to this post by Dominic Raferd
On Thu, Jul 13, 2017 at 02:02:09PM +0100, Dominic Raferd wrote:
> On 13 July 2017 at 13:56, Daniel Caulfield
> <[hidden email]> wrote:
>
> > How would we go about getting you a copy of the 'postfix/smtpd'
> > logs? where would they be or how do we switch that on?
>
> They will be in the same place, you can extract them thus:
>
> grep "/smtpd" /var/log/maillog​

This was not good advice, because there are likely relevant bits
being logged by other Postfix daemon processes.  Also, it likely
results in a large amount of irrelevant results.

I wrote a quick little document about how to condense relevant logs
from the sometimes overwhelming amount of information in Postfix
logs.  It's posted here, and comments and suggestions are solicited
and appreciated:

http://rob0.nodns4.us/postfix-logging

(Just a plain-text file for now, no HTML yet.)

I think Noel said he was going to start on something like this, and
perhaps he has but I missed it.  Noel, if this is useful to you in
that effort please feel free to adapt and/or incorporate it.
--
  http://rob0.nodns4.us/
  Offlist GMX mail is seen only if "/dev/rob0" is in the Subject:
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Postfix 3.2.0 - Sending to all MX records

Viktor Dukhovni
In reply to this post by Daniel Caulfield
On Thu, Jul 13, 2017 at 08:23:26AM -0700, Daniel Caulfield wrote:

> Apologies but I tried to post the log in the body but the forum is
> responding [Spam Detector] Message contains common spam words. and won't let
> me post it so the contents are in teh attached txt file.
>
> Thanks
> Danny MailLog.txt
> <http://postfix.1071664.n5.nabble.com/file/n91274/MailLog.txt>  

Frequent restarts of the queue manager are counterproductive.
Avoid running automated or frequent "postfix reload" commands:

Jul 13 00:15:28 mes-l2-rly-01 postfix/qmgr[29046]: 2412B80170667: skipped, still being delivered
Jul 13 00:25:13 mes-l2-rly-01 postfix/qmgr[1137]: 2412B80170667: skipped, still being delivered
Jul 13 00:29:41 mes-l2-rly-01 postfix/qmgr[3082]: 2412B80170667: skipped, still being delivered
Jul 13 00:49:26 mes-l2-rly-01 postfix/qmgr[12235]: 2412B80170667: skipped, still being delivered
Jul 13 01:09:02 mes-l2-rly-01 postfix/qmgr[21501]: 2412B80170667: skipped, still being delivered
Jul 13 01:15:36 mes-l2-rly-01 postfix/qmgr[24294]: 2412B80170667: skipped, still being delivered
Jul 13 01:16:37 mes-l2-rly-01 postfix/qmgr[24763]: 2412B80170667: skipped, still being delivered
Jul 13 01:17:36 mes-l2-rly-01 postfix/qmgr[24763]: 2412B80170667: skipped, still being delivered
Jul 13 01:18:46 mes-l2-rly-01 postfix/qmgr[25652]: 2412B80170667: skipped, still being delivered
Jul 13 01:20:55 mes-l2-rly-01 postfix/qmgr[26549]: 2412B80170667: skipped, still being delivered
Jul 13 01:23:01 mes-l2-rly-01 postfix/qmgr[27670]: 2412B80170667: skipped, still being delivered
Jul 13 01:25:16 mes-l2-rly-01 postfix/qmgr[28600]: 2412B80170667: skipped, still being delivered
Jul 13 01:27:18 mes-l2-rly-01 postfix/qmgr[29484]: 2412B80170667: skipped, still being delivered
Jul 13 01:29:32 mes-l2-rly-01 postfix/qmgr[30504]: 2412B80170667: skipped, still being delivered
Jul 13 01:32:05 mes-l2-rly-01 postfix/qmgr[31652]: 2412B80170667: skipped, still being delivered
Jul 13 02:07:16 mes-l2-rly-01 postfix/qmgr[14734]: 2412B80170667: skipped, still being delivered
Jul 13 02:09:27 mes-l2-rly-01 postfix/qmgr[15637]: 2412B80170667: skipped, still being delivered
Jul 13 02:11:55 mes-l2-rly-01 postfix/qmgr[16595]: 2412B80170667: skipped, still being delivered
Jul 13 02:39:49 mes-l2-rly-01 postfix/qmgr[28892]: 2412B80170667: skipped, still being delivered
Jul 13 02:43:41 mes-l2-rly-01 postfix/qmgr[30604]: 2412B80170667: skipped, still being delivered
Jul 13 02:45:47 mes-l2-rly-01 postfix/qmgr[31551]: 2412B80170667: skipped, still being delivered
Jul 13 02:47:51 mes-l2-rly-01 postfix/qmgr[32452]: 2412B80170667: skipped, still being delivered
Jul 13 02:53:48 mes-l2-rly-01 postfix/qmgr[2553]: 2412B80170667: skipped, still being delivered
Jul 13 02:55:56 mes-l2-rly-01 postfix/qmgr[3616]: 2412B80170667: skipped, still being delivered
Jul 13 02:58:28 mes-l2-rly-01 postfix/qmgr[4844]: 2412B80170667: skipped, still being delivered
Jul 13 03:01:57 mes-l2-rly-01 postfix/qmgr[6209]: 2412B80170667: skipped, still being delivered
Jul 13 03:04:08 mes-l2-rly-01 postfix/qmgr[7359]: 2412B80170667: skipped, still being delivered
Jul 13 03:14:02 mes-l2-rly-01 postfix/qmgr[11562]: 2412B80170667: skipped, still being delivered
Jul 13 03:16:08 mes-l2-rly-01 postfix/qmgr[12020]: 2412B80170667: from=<[hidden email]>, status=expired, returned to sender
Jul 13 03:16:08 mes-l2-rly-01 postfix/qmgr[12020]: 2412B80170667: removed

There's your problem.  Your SMTP delivery agent is blocked waiting
for something that never happens, until an interrupt after a 5 hour
watchdog timer.  This suggests a buggy operating system, perhaps
a virtual machine that loses timer events, or some other O/S bug.

Jul 13 03:15:02 l2-rly-01 relay085/smtp[11563]: fatal: watchdog timeout
Jul 13 03:15:02 l2-rly-01 relay059/smtp[11564]: fatal: watchdog timeout
Jul 13 03:15:02 l2-rly-01 relay017/smtp[11567]: fatal: watchdog timeout
...
Jul 13 03:15:02 l2-rly-01 relay095/smtp[11647]: fatal: watchdog timeout
Jul 13 03:15:02 l2-rly-01 relay099/smtp[11655]: fatal: watchdog timeout
Jul 13 03:15:02 l2-rly-01 relay008/smtp[11654]: fatal: watchdog timeout
Jul 13 03:15:02 l2-rly-01 postfix/qmgr[11562]: fatal: 202298015A3D0: timeout receiving delivery status from transport: relay085
Jul 13 03:15:03 l2-rly-01 postfix/master[9824]: warning: process /usr/libexec/postfix/smtp pid 11563 exit status 1
Jul 13 03:15:03 l2-rly-01 postfix/master[9824]: warning: process /usr/libexec/postfix/smtp pid 11564 exit status 1
Jul 13 03:15:03 l2-rly-01 postfix/master[9824]: warning: /usr/libexec/postfix/smtp: bad command startup -- throttling
...
Jul 13 03:16:14 l2-rly-01 postfix/master[9824]: warning: process /usr/libexec/postfix/smtp pid 12201 exit status 1
Jul 13 03:16:14 l2-rly-01 postfix/master[9824]: warning: process /usr/libexec/postfix/smtp pid 12202 exit status 1
Jul 13 03:16:14 l2-rly-01 postfix/master[9824]: warning: process /usr/libexec/postfix/smtp pid 12204 exit status 1

That's a rather large set of smtp(8) transports.  Perhaps a snowshoe
bulk email setup.  Good luck.

--
        Viktor.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Postfix 3.2.0 - Sending to all MX records

Dominic Raferd
In reply to this post by /dev/rob0


On 13 July 2017 at 19:14, /dev/rob0 <[hidden email]> wrote:
On Thu, Jul 13, 2017 at 02:02:09PM +0100, Dominic Raferd wrote:
> On 13 July 2017 at 13:56, Daniel Caulfield
> <[hidden email]> wrote:
>
> > How would we go about getting you a copy of the 'postfix/smtpd'
> > logs? where would they be or how do we switch that on?
>
> They will be in the same place, you can extract them thus:
>
> grep "/smtpd" /var/log/maillog​

This was not good advice, because there are likely relevant bits
being logged by other Postfix daemon processes.  Also, it likely
results in a large amount of irrelevant results.

​It was not advice, it was a direct answer to a simple question.​
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Postfix 3.2.0 - Sending to all MX records

Wietse Venema
In reply to this post by Viktor Dukhovni
Viktor Dukhovni:

> There's your problem.  Your SMTP delivery agent is blocked waiting
> for something that never happens, until an interrupt after a 5 hour
> watchdog timer.  This suggests a buggy operating system, perhaps
> a virtual machine that loses timer events, or some other O/S bug.
>
> Jul 13 03:15:02 l2-rly-01 relay085/smtp[11563]: fatal: watchdog timeout
> Jul 13 03:15:02 l2-rly-01 relay059/smtp[11564]: fatal: watchdog timeout
> Jul 13 03:15:02 l2-rly-01 relay017/smtp[11567]: fatal: watchdog timeout
> ...
> Jul 13 03:15:02 l2-rly-01 relay095/smtp[11647]: fatal: watchdog timeout
> Jul 13 03:15:02 l2-rly-01 relay099/smtp[11655]: fatal: watchdog timeout
> Jul 13 03:15:02 l2-rly-01 relay008/smtp[11654]: fatal: watchdog timeout
> Jul 13 03:15:02 l2-rly-01 postfix/qmgr[11562]: fatal: 202298015A3D0: timeout receiving delivery status from transport: relay085

I agree, your OS or VM (or both) is borked. The only Postfix change
that might avoid this is to drastically reduce process limits.

        Wietse
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Postfix 3.2.0 - Sending to all MX records

Tom Hudson
Hi,

Thank you for all of your help so far with this issue.

Just to let you know, the problem appears to have been caused by the "daemon_timeout" being set far too low during our initial configuration. I know the default in postfix is 18000s but ours had been set to 60s! Increasing this to 600s appears to have caused the problem to go away.

Again, thank you all for your help.

Cheers,
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Postfix 3.2.0 - Sending to all MX records

Viktor Dukhovni

> On Jul 14, 2017, at 8:40 AM, Tom Hudson <[hidden email]> wrote:
>
> Thank you for all of your help so far with this issue.
>
> Just to let you know, the problem appears to have been caused by the
> "daemon_timeout" being set far too low during our initial configuration. I
> know the default in postfix is 18000s but ours had been set to 60s!
> Increasing this to 600s appears to have caused the problem to go away.
>
> Again, thank you all for your help.

Even 600s is far too short.  This is a last resort safety mechanism that
should never happen, not an routine operational timeout.  Set it to at
least 1 hour, or better yet, just leave at the default 5 hours.

--
        Viktor.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Postfix 3.2.0 - Sending to all MX records

@lbutlr
In reply to this post by Tom Hudson
On 14 Jul 2017, at 06:40, Tom Hudson <[hidden email]> wrote:
> Just to let you know, the problem appears to have been caused by the
> "daemon_timeout" being set far too low during our initial configuration. I
> know the default in postfix is 18000s but ours had been set to 60s!
> Increasing this to 600s appears to have caused the problem to go away.

Does it occur to you that there might possibly be a reason the default is 18000 and that reducing it by a couple of orders of magnitude is not something you should just do without knowing the ramifications?

Do you even know what the setting does?

--
Apple broke AppleScripting signatures in Mail.app, so no random signatures.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Postfix 3.2.0 - Sending to all MX records

Viktor Dukhovni

> On Jul 14, 2017, at 11:29 AM, @lbutlr <[hidden email]> wrote:
>
> Does it occur to you that there might possibly be a reason the default is 18000 and that reducing it by a couple of orders of magnitude is not something you should just do without knowing the ramifications?
>
> Do you even know what the setting does?

Let's keep the tone more neutral and stick to the facts, this verges on ad hominem.
If I've set a poor example, I apologize and will try to do better.  Thanks.

--
        Viktor.

Loading...