Re: What is postfix telling me to do?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: What is postfix telling me to do?

Bastian Blank-3
On Tue, Jun 26, 2018 at 04:47:57PM -0400, Wietse Venema wrote:
> >  Out: 451 4.3.0 Error: queue file write error
> You have "notify_classes = ... bounce ..." somewhere, otherwise
> you would not receive the above SMTP session recording.
>
> Try: postconf -P | grep notify_classes

Are you sure this does not fall into the resource class?

| static int data_cmd(SMTPD_STATE *state, int argc, SMTPD_TOKEN *unused_argv)
| {
[…]
|     } else if ((state->err & CLEANUP_STAT_WRITE) != 0) {
|         state->error_mask |= MAIL_ERROR_RESOURCE;

Only the log will show the real reason.

Bastian

--
Ahead warp factor one, Mr. Sulu.
Reply | Threaded
Open this post in threaded view
|

Re: What is postfix telling me to do?

Bastian Blank-3
On Tue, Jun 26, 2018 at 02:34:19PM -0400, James B. Byrne wrote:
>                                          When we do we frequently
> (always?) get messages like this in the mail queue:

Such important mails must not lay around in the queue, but needs to be
delivered.

>  In:  RCPT TO:<[hidden email]>
>  Out: 250 2.1.5 Ok
>
> Note that 'In:  RCPT TO:<[hidden email]>' refers to a
> non-existent mailbox address.

Then why do you accept it in the first place?

>  In:  DATA
>  Out: 354 End data with <CR><LF>.<CR><LF>
>  Out: 451 4.3.0 Error: queue file write error
> For other details, see the local mail logfile

This is the real problem and you will find it in the (non-verbose!) log.

> Mailq displays this:
>
> 002F8E7E3      1063 Mon Jun 25 22:13:17  [hidden email]    
>         (delivery temporarily suspended: Server certificate not
> verified)
>                                  [hidden email]

Okay, so your complete setup is broken.  You need to set "myorigin" to a
correct value.

> Since I do not set notify_classes in main.cf these are the defaults.
> The address for [hidden email] is aliased to root and root is
> aliased to [hidden email].  All traffic on mx32 destined to
> hll.ca. is routed via the transport map to:
>
> harte-lyne.ca        relay:[mx07.hamilton.hll.ca]
> .harte-lyne.ca       relay:[mx07.hamilton.hll.ca]

Obviously it is not.

Please follow the steps lined out in
http://www.postfix.org/DEBUG_README.html#mail, don't forget one.

Bastian

--
A Vulcan can no sooner be disloyal than he can exist without breathing.
                -- Kirk, "The Menagerie", stardate 3012.4
Reply | Threaded
Open this post in threaded view
|

Re: What is postfix telling me to do?

James B. Byrne

On Tue, Jun 26, 2018 at 02:34:19PM -0400, James B. Byrne wrote:
> When we do we frequently (always?) get messages like this in
> the mail queue:

Well, I may not have solved the problem, but I have identified what it
is and provided a work-around for now.

The issue is this:

smtp_tls_security_level=dane

Last week Viktor Dukhovni reported to me that our domain had a problem
with our DNSSEC.  I investigated that and discovered on our master DNS
host the named daemon was reporting errors similar to the following.

Jun  9 00:37:40 dns03 named[3729]: malformed transaction: . . .

These errors did not have any obvious cause and inquires on the OS
users list did not elicit any diagnosis.  So for want of any clear
remediation I simply restarted the named service and the problem with
DNSSEC went away.

However, the problem returned. And, in combination with the dane
security level setting in Postfix, apparently caused Postfix to
report:

(delivery temporarily suspended: Server certificate not verified)

Switching smtp_tls_security_level from 'dane' to 'may' allowed the
mail to be delivered without further problem.

[root@mx32 ~]# mailq
Mail queue is empty

This still does not clarify for me why the double-bounce address was
being reported given that the postconf reported values for notify did
not include bounces.

Now, the problem with DANE and our DNS master service turns out to be
rather odd.  Somehow we ended up with two named processes running
simultaneously on that host. How one can have two processes bind to
the same port escapes me but evidently it happened.  Or it did not
happen and the June 8 process somehow failed to terminate in
consequence.

[root@dns03 ~ (master #)]# service named restart
Stopping named:                                            [  OK  ]
Starting named:                                            [  OK  ]

[root@dns03 ~ (master #)]# ps -ef | grep named
named     3729     1  0 Jun08 ?        00:00:59 /usr/sbin/named -u named
named    24749     1  0 10:04 ?        00:00:00 /usr/sbin/named -u named
root     24859 22025  0 10:06 pts/1    00:00:00 grep named

June 8 corresponds to when the dynamic update errors commenced:

/var/log/messages-20180610:Jun  8 13:37:38 dns03 named[3729]:
malformed transaction:  . . .

How this problem impacted DNSSEC I do not understand but the evidence
is that it did.  The solution to the twinned named service problem was
to first kill the named process from June 8 and then restart the
remaining named service.  We will see if the DNS problem reoccurs.

But, the only change required to get the mail delivered was switching
the security level from 'dane' to 'may'. Which will have to go back to
dane once I am convinced that our underlying problems with DNSSEC has
been resolved.

--
***          e-Mail is NOT a SECURE channel          ***
        Do NOT transmit sensitive data via e-Mail
 Do NOT open attachments nor follow links sent by e-Mail

James B. Byrne                mailto:[hidden email]
Harte & Lyne Limited          http://www.harte-lyne.ca
9 Brockley Drive              vox: +1 905 561 1241
Hamilton, Ontario             fax: +1 905 561 0757
Canada  L8E 3C3

Reply | Threaded
Open this post in threaded view
|

Re: What is postfix telling me to do?

Wietse Venema
James B. Byrne:
> This still does not clarify for me why the double-bounce address was
> being reported given that the postconf reported values for notify did
> not include bounces.

Asl someone else on the list noted, this was triggered by the
'queue file write error' condition. Postfix is quote verbose
about such errors, so I am surrised that you came up
empty-handed when examining the logs.

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: What is postfix telling me to do?

James B. Byrne

On Wed, June 27, 2018 13:49, Wietse Venema wrote:

> James B. Byrne:
>> This still does not clarify for me why the double-bounce address was
>> being reported given that the postconf reported values for notify
>> did
>> not include bounces.
>
> Asl someone else on the list noted, this was triggered by the
> 'queue file write error' condition. Postfix is quote verbose
> about such errors, so I am surrised that you came up
> empty-handed when examining the logs.
>
> Wietse
>

That was not Postfix's fault.  I am transitioning between two server
OS.  MX32 runs FreeBSD and on that OS maillog files are bz compressed
and rolled over every night.  I was oblivious to that when I searched
and so obviously I found nothing outside of the current maillog file.
The log entries were there but, by the time I found them, I had moved
past the need to see them.

--
***          e-Mail is NOT a SECURE channel          ***
        Do NOT transmit sensitive data via e-Mail
 Do NOT open attachments nor follow links sent by e-Mail

James B. Byrne                mailto:[hidden email]
Harte & Lyne Limited          http://www.harte-lyne.ca
9 Brockley Drive              vox: +1 905 561 1241
Hamilton, Ontario             fax: +1 905 561 0757
Canada  L8E 3C3

Reply | Threaded
Open this post in threaded view
|

Re: What is postfix telling me to do?

Viktor Dukhovni
In reply to this post by James B. Byrne


> On Jun 27, 2018, at 11:47 AM, James B. Byrne <[hidden email]> wrote:
>
> Last week Viktor Dukhovni reported to me that our domain had a problem
> with our DNSSEC.

For what it is worth, your DNS looks 100% clean now, the previous
"denial of existence" issues are gone:

  http://dnsviz.net/d/_25._tcp.hllmx150.harte-lyne.ca/dnssec/

And your DANE TLSA records are fine for the MX hosts that are
reachable from my server:

  ; MX lookup DNSSEC signed and validates:
  ;
  harte-lyne.ca. IN MX 30 inet08.hamilton.harte-lyne.ca. ; NoError AD=1
  harte-lyne.ca. IN MX 80 mx32.harte-lyne.ca. ; NoError AD=1
  harte-lyne.ca. IN MX 100 mx132.harte-lyne.ca. ; NoError AD=1
  harte-lyne.ca. IN MX 110 mx70.harte-lyne.ca. ; NoError AD=1
  harte-lyne.ca. IN MX 120 mx90.harte-lyne.ca. ; NoError AD=1

  ; All A/AAAA records signed and validate, including denial of
  ; existence for AAAA:
  ;
  inet08.hamilton.harte-lyne.ca. IN A 216.185.71.28 ; NoError AD=1
  inet08.hamilton.harte-lyne.ca. IN AAAA ? ; NODATA AD=1
  mx32.harte-lyne.ca. IN A 216.185.71.32 ; NoError AD=1
  mx32.harte-lyne.ca. IN AAAA ? ; NODATA AD=1
  mx132.harte-lyne.ca. IN A 216.185.71.132 ; NoError AD=1
  mx132.harte-lyne.ca. IN AAAA ? ; NODATA AD=1
  mx70.harte-lyne.ca. IN A 216.185.71.198 ; NoError AD=1
  mx70.harte-lyne.ca. IN AAAA ? ; NODATA AD=1
  mx90.harte-lyne.ca. IN A 216.185.71.199 ; NoError AD=1
  mx90.harte-lyne.ca. IN AAAA ? ; NODATA AD=1
 
  ; All MX hosts share the same TLSA RRset via CNAMEs:
  ;
  _25._tcp.inet08.hamilton.harte-lyne.ca. IN CNAME _tlsa._dane.trust.harte-lyne.ca. ; NoError AD=1
  _25._tcp.mx32.harte-lyne.ca. IN CNAME _tlsa._dane.trust.harte-lyne.ca. ; NoError AD=1
  _25._tcp.mx132.harte-lyne.ca. IN CNAME _tlsa._dane.trust.harte-lyne.ca. ; NoError AD=1
  _25._tcp.mx70.harte-lyne.ca. IN CNAME _tlsa._dane.trust.harte-lyne.ca. ; NoError AD=1
  _25._tcp.mx90.harte-lyne.ca. IN CNAME _tlsa._dane.trust.harte-lyne.ca. ; NoError AD=1

  ; The TLSA records are DANE-TA(2):
  ;
  _tlsa._dane.trust.harte-lyne.ca. IN TLSA 2 0 2 67274b355428905895c6b581950e0ed4f7d043f31f7e7020b716b7faa06776b6aadd33e127624b6e8c75c520a01d9cad3bd29f18fa7dcb3d5fd3917510e6722a ; NoError AD=1
  _tlsa._dane.trust.harte-lyne.ca. IN TLSA 2 1 2 380259229e21a1946b38cfc594cbc993b61bc93762b7b6c6637b3eef9c5a2bb70c589b91beb73bd1304eac11b3917e33819e2b47d25d4966435a2a3e83c1f80f ; NoError AD=1
  _tlsa._dane.trust.harte-lyne.ca. IN TLSA 2 1 2 c26e0ec16a46a97386e8f31f8ecc971f2d73136aa377dfdaac2b2b00f7cab4bb29b17d913c82093b41fd0d9e40b66a68361c126f1f4017f9ce60eabc5adba90e ; NoError AD=1

    ; The depth 1 and 2 CAs of the certificate chains of the first two MX hosts
    ; match the "2 1 2" TLSA records.  The "2 0 2" seems redundant...
    ;
    inet08.hamilton.harte-lyne.ca[216.185.71.28]: pass: TLSA match: depth = 1, name = inet08.hamilton.harte-lyne.ca
    depth = 1
      pkey sha512 [matched] <- 2 1 2 380259229e21a1946b38cfc594cbc993b61bc93762b7b6c6637b3eef9c5a2bb70c589b91beb73bd1304eac11b3917e33819e2b47d25d4966435a2a3e83c1f80f
    depth = 2
      pkey sha512 [matched] <- 2 1 2 c26e0ec16a46a97386e8f31f8ecc971f2d73136aa377dfdaac2b2b00f7cab4bb29b17d913c82093b41fd0d9e40b66a68361c126f1f4017f9ce60eabc5adba90e

    mx32.harte-lyne.ca[216.185.71.32]: pass: TLSA match: depth = 1, name = mx32.harte-lyne.ca
    depth = 1
      pkey sha512 [matched] <- 2 1 2 380259229e21a1946b38cfc594cbc993b61bc93762b7b6c6637b3eef9c5a2bb70c589b91beb73bd1304eac11b3917e33819e2b47d25d4966435a2a3e83c1f80f
    depth = 2
      pkey sha512 [matched] <- 2 1 2 c26e0ec16a46a97386e8f31f8ecc971f2d73136aa377dfdaac2b2b00f7cab4bb29b17d913c82093b41fd0d9e40b66a68361c126f1f4017f9ce60eabc5adba90e

    ; The last 3 MX hosts appear unreachable from the public Internet.
    mx132.harte-lyne.ca[216.185.71.132]: connection timeout
    mx70.harte-lyne.ca[216.185.71.198]: connection timeout
    mx90.harte-lyne.ca[216.185.71.199]: connection timeout


> I investigated that and discovered on our master DNS
> host the named daemon was reporting errors similar to the following.
>
> Jun  9 00:37:40 dns03 named[3729]: malformed transaction: . . .
>
> However, the problem returned. And, in combination with the dane
> security level setting in Postfix, apparently caused Postfix to
> report:
>
> (delivery temporarily suspended: Server certificate not verified)

Since the problem was with your authoritative DNS, presumably TLSA
lookups of your own domain were failing, which would affect delivery
of bounces or other notices.  As noted by others, you might want to
find a root cause for some of these notices.

> Switching smtp_tls_security_level from 'dane' to 'may' allowed the
> mail to be delivered without further problem.

With your own DNS misbehaving, you may indeed need to enable work-arounds.

> How one can have two processes bind to
> the same port escapes me but evidently it happened.

They did not both bind to the same port.  Rather one of them
was running with the port not bound, retrying the bind
periodically.

> Or it did not
> happen and the June 8 process somehow failed to terminate in
> consequence.

IIRC BIND attempts to keep running, and waits for the
address and/or port to become available.

> [root@dns03 ~ (master #)]# ps -ef | grep named
> named     3729     1  0 Jun08 ?        00:00:59 /usr/sbin/named -u named
> named    24749     1  0 10:04 ?        00:00:00 /usr/sbin/named -u named
> root     24859 22025  0 10:06 pts/1    00:00:00 grep named
>
> June 8 corresponds to when the dynamic update errors commenced:
>
> /var/log/messages-20180610:Jun  8 13:37:38 dns03 named[3729]:
> malformed transaction:  . . .

With two named processes running, they might both have been trying
to maintain the zone signatures, and perhaps stepped on each other,
breaking the data records needed for incremental signing to work
correctly.

> The solution to the twinned named service problem was
> to first kill the named process from June 8 and then restart the
> remaining named service.  We will see if the DNS problem reoccurs.

Good luck.  You might want to monitor that *exactly* one named is
running at a time (automatic restarts may be risky, they may even
cause the problem, because a manual restart that leaves BIND down
briefly might cause the restart logic to do a parallel restart,
but if you're running a service manager, e.g. systemd, then it
may be able to keep just one process running, if you don't bypass
its interfaces.).

And you should still try to find the real cause of the
"queue file write error" issue.  The DANE bounce backlog
was a related symptom not a cause.

--
        Viktor.

Reply | Threaded
Open this post in threaded view
|

Re: What is postfix telling me to do?

James B. Byrne

On Wed, June 27, 2018 15:55, Viktor Dukhovni wrote:

Thank you for your assistance.

>
> And you should still try to find the real cause of the
> "queue file write error" issue.  The DANE bounce backlog
> was a related symptom not a cause.
>

I believe that I have uncovered that as well.

[root@mx32 ~]# tail /var/log/maillog
. . .
Jun 27 15:52:03 mx32 postfix-p25/smtpd[50050]: connect from
mx32.hll.ca[A.B.C.32]
Jun 27 15:52:03 mx32 postfix-p25/smtpd[50050]: NOQUEUE:
client=mx32.hll.ca[A.B.C.32]
Jun 27 15:52:03 mx32 amavis[6850]: (!)DENIED ACCESS from IP
127.0.32.1, policy bank ''
Jun 27 15:52:03 mx32 postfix-p25/smtpd[50050]: warning: lost
connection with proxy 127.0.32.1:10024
Jun 27 15:52:03 mx32 postfix-p25/smtpd[50050]: proxy-reject:
END-OF-MESSAGE: 451 4.3.0 Error: queue file write error;
from=<[hidden email]> to=<[hidden email]> proto=ESMTP
helo=<mx32.hll.ca>
Jun 27 15:52:03 mx32 postfix/cleanup[40298]: 9E70612C76:
message-id=<[hidden email]>
. . .

I infer from this that I have some sort of misconfiguration in
amavisd.conf.  I am not sure what but it may be an uninitialised
variable used to name one of the policy arrays.  I will be looking
into this tomorrow.


--
***          e-Mail is NOT a SECURE channel          ***
        Do NOT transmit sensitive data via e-Mail
 Do NOT open attachments nor follow links sent by e-Mail

James B. Byrne                mailto:[hidden email]
Harte & Lyne Limited          http://www.harte-lyne.ca
9 Brockley Drive              vox: +1 905 561 1241
Hamilton, Ontario             fax: +1 905 561 0757
Canada  L8E 3C3

Reply | Threaded
Open this post in threaded view
|

Re: What is postfix telling me to do?

Ralf Hildebrandt-2
In reply to this post by Bastian Blank-3
* James B. Byrne <[hidden email]>:

> I am configuring a new Postfix-3.3.0 service to act as one of our
> public MX providers.

>  Out: 250 2.1.0 Ok
>  In:  RCPT TO:<[hidden email]>
>  Out: 250 2.1.5 Ok
>  In:  DATA
>  Out: 354 End data with <CR><LF>.<CR><LF>
>  Out: 451 4.3.0 Error: queue file write error
>  In:  QUIT
>  Out: 221 2.0.0 Bye
>
>
> Note that 'In:  RCPT TO:<[hidden email]>' refers to a
> non-existent mailbox address.

You MX must reject mails for non-existing recipients. Meaning, you
need a list of valid recipients or perform LDAP queries or
call-forwards.

> Mailq displays this:
>
> 002F8E7E3      1063 Mon Jun 25 22:13:17  [hidden email]    
>         (delivery temporarily suspended: Server certificate not verified)
>                                  [hidden email]

Check your logs. Probably some TLS/SSL issue (nexthop must use TLS,
but can't)



> The double-bounce address is that used by Postfix for sender
> verification probes.  Why are they still in the queue?  Are they not
> supposed to be discarded?

Yes, but only after they failed PERMANENTLY. "delivery temporarily suspended: Server certificate not verified"
indicates a temporary failure.

> To which server does the message '(delivery temporarily suspended:
> Server certificate not verified)' apply?

check 002F8E7E3 in your log. Probably the machine being MX for
mx32.hll.ca unless you're using transport maps

Reply | Threaded
Open this post in threaded view
|

Re: What is postfix telling me to do?

James B. Byrne
In reply to this post by James B. Byrne

On Wed, June 27, 2018 17:02, James B. Byrne wrote:
>
> On Wed, June 27, 2018 15:55, Viktor Dukhovni wrote:
. . .
>> And you should still try to find the real cause of the
>> "queue file write error" issue.  The DANE bounce backlog
>> was a related symptom not a cause.
>>
>
> I believe that I have uncovered that as well.
>
. . .
> I infer from this that I have some sort of misconfiguration in
> amavisd.conf. . .

The issue was that amavisd's default inet_acl values only allow
connections from 127.0.0.1 and [::1].  Adding an explicit setting in
amavisd.conf of:

@inet_acl = qw( 127.0.0.1 [::1] 127.0.32.1 [::32]);

resolved this issue.  It all works within a FreeBSD jail as it should,
at the moment.

Thanks for the list's advice and help. It is always greatly appreciated.

--
***          e-Mail is NOT a SECURE channel          ***
        Do NOT transmit sensitive data via e-Mail
 Do NOT open attachments nor follow links sent by e-Mail

James B. Byrne                mailto:[hidden email]
Harte & Lyne Limited          http://www.harte-lyne.ca
9 Brockley Drive              vox: +1 905 561 1241
Hamilton, Ontario             fax: +1 905 561 0757
Canada  L8E 3C3