reject_unknown_client_hostname rejecting on SERVFAIL

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

reject_unknown_client_hostname rejecting on SERVFAIL

Bernhard Schmidt
Hi,

we're running 2.5.1 with the pretty hard setting

smtpd_client_restrictions =
        check_client_access cidr:biiiiig_whitelist,
        reject_unknown_client_hostname

unknown_client_reject_code = ${stress?421}${stress:550}

According to the documentation (and my expectation) the reject code
should be 450 if either address->name or name->address fails (e.g.
timeout or SERVFAIL)

According to my logs this is true if the name->address mapping fails,
but if the address->name mapping fails with SERVFAIL the mail is still
rejected with 550.

lxmhs17:~ # host 89.139.242.190
;; connection timed out; no servers could be reached
lxmhs17:~ # host 89.139.242.190
Host 190.242.139.89.in-addr.arpa not found: 2(SERVFAIL)

but

Jun  2 14:25:54 lxmhs17 postfix/smtpd[11967]: NOQUEUE: reject: RCPT from
unknown[89.139.242.190]: 550 5.7.1 Client host rejected: cannot find
your hostname, [89.139.242.190]; from=<[hidden email]>
to=<[hidden email]> proto=ESMTP helo=<harari>

Is this a bug or am I doing something seriously wrong?

Bernhard
Reply | Threaded
Open this post in threaded view
|

Re: reject_unknown_client_hostname rejecting on SERVFAIL

Wietse Venema
Bernhard Schmidt:

> Hi,
>
> we're running 2.5.1 with the pretty hard setting
>
> smtpd_client_restrictions =
> check_client_access cidr:biiiiig_whitelist,
> reject_unknown_client_hostname
>
> unknown_client_reject_code = ${stress?421}${stress:550}
>
> According to the documentation (and my expectation) the reject code
> should be 450 if either address->name or name->address fails (e.g.
> timeout or SERVFAIL)
>
> According to my logs this is true if the name->address mapping fails,
> but if the address->name mapping fails with SERVFAIL the mail is still
> rejected with 550.
>
> lxmhs17:~ # host 89.139.242.190
> ;; connection timed out; no servers could be reached
> lxmhs17:~ # host 89.139.242.190
> Host 190.242.139.89.in-addr.arpa not found: 2(SERVFAIL)
>
> but
>
> Jun  2 14:25:54 lxmhs17 postfix/smtpd[11967]: NOQUEUE: reject: RCPT from
> unknown[89.139.242.190]: 550 5.7.1 Client host rejected: cannot find
> your hostname, [89.139.242.190]; from=<[hidden email]>
> to=<[hidden email]> proto=ESMTP helo=<harari>
>
> Is this a bug or am I doing something seriously wrong?

Postfix uses the getnameinfo() SYSTEM LIBRARY routine.

Apparently, your system's version reports DNS error code SERVFAIL
as an unrecoverable error condition.

Postfix considers the following getnameinfo() results recoverable:
EAI_AGAIN, EAI_MEMORY, or EAI_SYSTEM. See src/smtp/smtpd_peer.c.

So this would be a bug in your system library.

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: reject_unknown_client_hostname rejecting on SERVFAIL

Victor Duchovni
In reply to this post by Bernhard Schmidt
On Mon, Jun 02, 2008 at 02:28:15PM +0200, Bernhard Schmidt wrote:

> Hi,
>
> we're running 2.5.1 with the pretty hard setting
>
> smtpd_client_restrictions =
> check_client_access cidr:biiiiig_whitelist,
> reject_unknown_client_hostname
>
> unknown_client_reject_code = ${stress?421}${stress:550}
>
> According to the documentation (and my expectation) the reject code
> should be 450 if either address->name or name->address fails (e.g.
> timeout or SERVFAIL)
>
> According to my logs this is true if the name->address mapping fails,
> but if the address->name mapping fails with SERVFAIL the mail is still
> rejected with 550.
>
> lxmhs17:~ # host 89.139.242.190
> ;; connection timed out; no servers could be reached
> lxmhs17:~ # host 89.139.242.190
> Host 190.242.139.89.in-addr.arpa not found: 2(SERVFAIL)
>
> but
>
> Jun  2 14:25:54 lxmhs17 postfix/smtpd[11967]: NOQUEUE: reject: RCPT from
> unknown[89.139.242.190]: 550 5.7.1 Client host rejected: cannot find
> your hostname, [89.139.242.190]; from=<[hidden email]>
> to=<[hidden email]> proto=ESMTP helo=<harari>

This does not prove the point. The nameservice for this IP is rather intermittently
available. Sometimes it just works, other times it fails.

When it works, I get:

    $ ./getnameinfo 89.139.242.190
    Hostname:       89-139-242-190.bb.netvision.net.il
    Address:        89.139.242.190

    $ ./getaddrinfo 89-139-242-190.bb.netvision.net.il
    Hostname:       89-139-242-190.bb.netvision.net.il
    Addresses:      89.139.242.190

It seems to be working right now, but we don't know what the story
was when Postfix reported this error. Either at some point NXDOMAIN
was actually returned or, as Wietse says, the system library returns
unexpected error codes for temporary lookup failures.

--
        Viktor.

Disclaimer: off-list followups get on-list replies or get ignored.
Please do not ignore the "Reply-To" header.

To unsubscribe from the postfix-users list, visit
http://www.postfix.org/lists.html or click the link below:
<mailto:[hidden email]?body=unsubscribe%20postfix-users>

If my response solves your problem, the best way to thank me is to not
send an "it worked, thanks" follow-up. If you must respond, please put
"It worked, thanks" in the "Subject" so I can delete these quickly.
Reply | Threaded
Open this post in threaded view
|

Re: reject_unknown_client_hostname rejecting on SERVFAIL

Bernhard Schmidt
In reply to this post by Wietse Venema
Hello,

>> Is this a bug or am I doing something seriously wrong?
>
> Postfix uses the getnameinfo() SYSTEM LIBRARY routine.
>
> Apparently, your system's version reports DNS error code SERVFAIL
> as an unrecoverable error condition.
>
> Postfix considers the following getnameinfo() results recoverable:
> EAI_AGAIN, EAI_MEMORY, or EAI_SYSTEM. See src/smtp/smtpd_peer.c.
>
> So this would be a bug in your system library.

This appears to be the case, I can't really code C but my tests in
python (which forward gaierrors pretty well) show rc=-2 ('Name or
service not known') after quite some delay. I verified with tcpdump that
I did in fact get back SERVFAIL responses from both resolvers.

Does anyone else see that? I see that on SLES10.1 (glibc-2.4-31.43.6)
and on Ubuntu Hardy (libc6_2.7-10ubuntu3). I guess that should be the
most common platform around.

Bernhard
Reply | Threaded
Open this post in threaded view
|

Re: reject_unknown_client_hostname rejecting on SERVFAIL

Bernhard Schmidt
On Mon, Jun 02, 2008 at 03:53:25PM +0200, Bernhard Schmidt wrote:

> >> Is this a bug or am I doing something seriously wrong?
> >
> > Postfix uses the getnameinfo() SYSTEM LIBRARY routine.
> >
> > Apparently, your system's version reports DNS error code SERVFAIL
> > as an unrecoverable error condition.
> >
> > Postfix considers the following getnameinfo() results recoverable:
> > EAI_AGAIN, EAI_MEMORY, or EAI_SYSTEM. See src/smtp/smtpd_peer.c.
> >
> > So this would be a bug in your system library.
>
> This appears to be the case, I can't really code C but my tests in
> python (which forward gaierrors pretty well) show rc=-2 ('Name or
> service not known') after quite some delay. I verified with tcpdump that
> I did in fact get back SERVFAIL responses from both resolvers.
>
> Does anyone else see that? I see that on SLES10.1 (glibc-2.4-31.43.6)
> and on Ubuntu Hardy (libc6_2.7-10ubuntu3). I guess that should be the
> most common platform around.

Okay, I did some more research. Again, disclaimer, I'm nowhere near
understanding more than the basics of C, so I might be missing something
really big here.

All line numbers refer to the vanilla Postfix 2.5.2 code.

In src/smtpd/smtpd_peer.c:308 sockaddr_to_hostname() is called. This
function is defined in src/util/myaddrinfo.c:610 which basically calls
getnameinfo(...., NI_NAMEREQD) in line 676. This would be the function
that would need to return EAI_AGAIN, EAI_MEMORY or EAI_SYSTEM to have a
tempfail and thus a 450 reject code, right?

RFC3493 Section 6.2 allows the following Error Return Values for
getnameinfo():

|   Error Return Values:
|
|   The getnameinfo() function shall fail and return the corresponding
|   value if:
|
|   [EAI_AGAIN]    The name could not be resolved at this time.
|                  Future attempts may succeed.
|
|   [EAI_BADFLAGS] The flags had an invalid value.
|
|   [EAI_FAIL]     A non-recoverable error occurred.
|
|   [EAI_FAMILY]   The address family was not recognized or the address
|                  length was invalid for the specified family.
|
|   [EAI_MEMORY]   There was a memory allocation failure.
|
|   [EAI_NONAME]   The name does not resolve for the supplied parameters.
|                  NI_NAMEREQD is set and the host's name cannot be
|                  located, or both nodename and servname were null.
|
|   [EAI_OVERFLOW] An argument buffer overflowed.
|
|   [EAI_SYSTEM]   A system error occurred.  The error code can be found
|                  in errno.

In my point of view, in case of a timeout/servfail both EAI_AGAIN
(because it is a temporary error) and EAI_NONAME (because NI_NAMEREQD is
set) are allowed. Again, the usual disclaimer about my understanding of
C code applies, but it certainly looks like both glibc trunk and FreeBSD
chose to return EAI_NONAME in this situation

http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/net/getnameinfo.c?annotate=1.20
lines 273/274 and
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/inet/getnameinfo.c?annotate=1.36&cvsroot=glibc
lines 294-300

This matches the behaviour of my python test (not C, but the error codes
match on both platforms which is a very strong hint in my POV)

On Linux (glibc 2.7):
>>> import socket
>>> socket.getnameinfo( ('89.139.242.190', 0), socket.NI_NAMEREQD)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
socket.gaierror: (-2, 'Name or service not known')
# define EAI_NONAME       -2    /* NAME or SERVICE is unknown.  */

On FreeBSD 7.0-RELEASE:
>>> import socket
>>> socket.getnameinfo( ('89.139.242.190', 0), socket.NI_NAMEREQD)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
socket.gaierror: (8, 'hostname nor servname provided, or not known')
#define EAI_NONAME 8 /* nodename nor servname ... */

Am I missing anything?

Bernhard
Reply | Threaded
Open this post in threaded view
|

Re: reject_unknown_client_hostname rejecting on SERVFAIL

Wietse Venema
Bernhard Schmidt:

> Okay, I did some more research. Again, disclaimer, I'm nowhere near
> understanding more than the basics of C, so I might be missing something
> really big here.
>
> All line numbers refer to the vanilla Postfix 2.5.2 code.
>
> In src/smtpd/smtpd_peer.c:308 sockaddr_to_hostname() is called. This
> function is defined in src/util/myaddrinfo.c:610 which basically calls
> getnameinfo(...., NI_NAMEREQD) in line 676. This would be the function
> that would need to return EAI_AGAIN, EAI_MEMORY or EAI_SYSTEM to have a
> tempfail and thus a 450 reject code, right?

Yes, unless your Postfix was compiled with EMULATE_IPV4_ADDRINFO,
which is only supported on older systems that have no IPv6 support,
and that have no getnameinfo() etc. routines.

If your system library reports SERVFAIL errors as EAI_NONAME, then
there is no way to report this as a recoverable error.

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: reject_unknown_client_hostname rejecting on SERVFAIL

Bernhard Schmidt
Hello Wietse,

>> In src/smtpd/smtpd_peer.c:308 sockaddr_to_hostname() is called. This
>> function is defined in src/util/myaddrinfo.c:610 which basically calls
>> getnameinfo(...., NI_NAMEREQD) in line 676. This would be the function
>> that would need to return EAI_AGAIN, EAI_MEMORY or EAI_SYSTEM to have a
>> tempfail and thus a 450 reject code, right?
> If your system library reports SERVFAIL errors as EAI_NONAME, then
> there is no way to report this as a recoverable error.

Have you encountered any stack that behaves correctly here? I'm trying
to take this up with the glibc developers, having a working example
(preferably open source) would be very helpful.

Bernhard
Reply | Threaded
Open this post in threaded view
|

Re: reject_unknown_client_hostname rejecting on SERVFAIL

Bernhard Schmidt
In reply to this post by Wietse Venema
On Mon, Jun 02, 2008 at 05:25:32PM -0400, Wietse Venema wrote:

> If your system library reports SERVFAIL errors as EAI_NONAME, then
> there is no way to report this as a recoverable error.

For the record, after spending hours of barking up wrong trees (or at
least the wrong branches of the correct tree) this problem has finally
been resolved. Executive summary: this is/was indeed a bug in the system
library.

We originally observed this problem on SLES 10.1 which includes glibc
2.4. After you pointed towards an errorneous return value of
getnameinfo() I did some tests on my workstation (Ubuntu Hardy, glibc
2.7) and found it to be affected as well. Since there had been no
changes in glibc CVS since that version for that code I concluded that
this bug was still present in current glibc.

This assumption was wrong. The bug in glibc has been fixed with the
following commit for glibc 2.5

http://sourceware.org/cgi-bin/cvsweb.cgi/libc/inet/getnameinfo.c.diff?r1=1.34&r2=1.35&cvsroot=glibc&f=h

My Hardy workstation (glibc 2.7) still being broken was caused by an
unrelated problem with the mDNS/avahi module installed on Ubuntu by
default

bschmidt@lxbsc01:~$ grep ^hosts: /etc/nsswitch.conf
hosts:          files mdns4_minimal [NOTFOUND=return] dns mdns4
bschmidt@lxbsc01:~$ ./getnameinfo 62.85.116.236
rv:Name or service not known(-2)
bschmidt@lxbsc01:~$ sudo vim /etc/nsswitch.conf
bschmidt@lxbsc01:~$ grep ^hosts: /etc/nsswitch.conf
hosts:          files dns
bschmidt@lxbsc01:~$ ./getnameinfo 62.85.116.236
rv:Temporary failure in name resolution(-3)

After recompiling the glibc 2.4 in SLES10 with the patch applied
getnameinfo() and thus Postfix behave as expected.

Would it be unreasonable to add a heads-up to the manpage? Definitely
affected are

 * SLES 10 (including the recently released SP2) shipping glibc 2.4
 * Debian Etch shipping glibc 2.3
 * FreeBSD 7.0-RELEASE (not shipping any glibc but according to my tests
   broken as well)

I'll file the appropriate bug reports with Novell and Debian in the next
couple of days, but it will probably take years rather than months to
fix all the systems out there, so a small note in the manpage would
probably be a good idea. And/or maybe ship a small test program that can
be used to determine whether your system library is broken. I can
provide an IP address where the reverse lookup will always fail if
necessary.

Regards,
Bernhard