Understanding postscreen timeouts

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Understanding postscreen timeouts

Alex Regan
Hi,

I'm using postfix-2.10.3 with fedora20 and have configured postscreen with spamhaus, barracuda, and a few other DNSBLs. I'm however occasionally receiving the following timeout message:

May  1 17:15:01 mail01 postfix/postscreen[4429]: warning: dnsblog reply timeout 10s for swl.spamhaus.org

This appears to happen during periods of load, but also when the server is idle. I understand it's possible to increase the timeout, but I would think 10s would be long enough, so didn't want to start doing that. This is also on multiple hosts on multiple different, unrelated networks.

I'm also using a half-dozen RBLs, but they don't all always timeout.

I'm using a local bind caching server on the hosts that are involved. Should I consider setting up rbldnsd for this instead? Or is that only for caching local RBLs only?

What is the result of this timeout? Does postscreen/dnsblog retry, or is the attempt failed and the mail just passed on?

Here is the relevant postscreen info from my config. Please let me know if the full config is necessary.

postscreen_access_list = permit_mynetworks, cidr:/etc/postfix/postscreen_access.cidr
postscreen_blacklist_action = drop
postscreen_dnsbl_action = enforce
postscreen_dnsbl_reply_map = pcre:$config_directory/postscreen_dnsbl_reply_map.pcre
postscreen_dnsbl_sites = mykey.zen.dq.spamhaus.net*3 b.barracudacentral.org*2 bl.spameatingmonkey.net*2 bl.spamcop.net dnsbl.sorbs.net psbl.surriel.com bl.mailspike.net swl.spamhaus.org*-4 list.dnswl.org=127.[0..255].[0..255].0*-2 list.dnswl.org=127.[0..255].[0..255].1*-3 list.dnswl.org=127.[0..255].[0..255].[2..255]*-4
postscreen_dnsbl_threshold = 3
postscreen_greet_action = enforce
postscreen_whitelist_interfaces = static:all 172.XX.YY.160/32 64.XX.YY.0/24 67.XX.YY.0/24

Thanks so much,
Alex

Reply | Threaded
Open this post in threaded view
|

Re: Understanding postscreen timeouts

Wietse Venema
Alex:
> I'm using postfix-2.10.3 with fedora20 and have configured postscreen with
> spamhaus, barracuda, and a few other DNSBLs. I'm however occasionally
> receiving the following timeout message:
>
> May  1 17:15:01 mail01 postfix/postscreen[4429]: warning: dnsblog reply
> timeout 10s for swl.spamhaus.org

This time limit has unfortunately escaped my attention.  It is not
yet configurable.

The warning message means that postscreen gives up waiting for the
DNS lookup result. This is a safety mechanism.

> I'm also using a half-dozen RBLs, but they don't all always timeout.

I see occasional timeouts on residential and co-located servers.
By default the resolver *system library* routines wait 5s before
retrying; this may be configurable in resolv.conf, but the
postscreen time limit is still hard-coded.

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: Understanding postscreen timeouts

Alex Regan
Hi,

On Thu, May 1, 2014 at 5:38 PM, Wietse Venema <[hidden email]> wrote:
Alex:
> I'm using postfix-2.10.3 with fedora20 and have configured postscreen with
> spamhaus, barracuda, and a few other DNSBLs. I'm however occasionally
> receiving the following timeout message:
>
> May  1 17:15:01 mail01 postfix/postscreen[4429]: warning: dnsblog reply
> timeout 10s for swl.spamhaus.org

This time limit has unfortunately escaped my attention.  It is not
yet configurable.

The warning message means that postscreen gives up waiting for the
DNS lookup result. This is a safety mechanism.

> I'm also using a half-dozen RBLs, but they don't all always timeout.

I see occasional timeouts on residential and co-located servers.
By default the resolver *system library* routines wait 5s before
retrying; this may be configurable in resolv.conf, but the
postscreen time limit is still hard-coded.

These are both corporate 10mbs dedicated links and I don't think latency and/or bandwidth is a problem.

It actually appears swl.spamhaus.org is the main problem. It doesn't even resolve when I try to do it manually. This was a recommendation I used from this list some time ago. Has something changed? This is my current config:

postscreen_dnsbl_sites = mykey.zen.dq.spamhaus.net*3
        b.barracudacentral.org*2
        bl.spameatingmonkey.net*2
        bl.spamcop.net
        dnsbl.sorbs.net
        psbl.surriel.com
        bl.mailspike.net
        swl.spamhaus.org*-4
        list.dnswl.org=127.[0..255].[0..255].0*-2
        list.dnswl.org=127.[0..255].[0..255].1*-3
        list.dnswl.org=127.[0..255].[0..255].[2..255]*-4

I'm also curious what resolvers people are using for their mail servers? bind? Looking at my query graphs, it appears to be about 30 queries/sec on average for each host, just as a local caching server.

Thanks,
Alex

Reply | Threaded
Open this post in threaded view
|

Re: Understanding postscreen timeouts

Stan Hoeppner
On 5/1/2014 8:15 PM, Alex wrote:
...
> These are both corporate 10mbs dedicated links and I don't think latency
> and/or bandwidth is a problem.

The problem, if network related, will be UDP packet loss somewhere in
the end-to-end path, not b/w or latency on the CPE link into the
provider's net.

> It actually appears swl.spamhaus.org is the main problem. It doesn't even
> resolve when I try to do it manually.

From here:

$ host 2.0.0.127.swl.spamhaus.org
2.0.0.127.swl.spamhaus.org has address 127.0.2.2

What response do you receive?

Due to your query volume you require paid service for Spamhaus Zen.  The
same terms apply to all Spamhaus services.  Your IPs may have been
blacklisted from DWL due to high query volume.  Contact Spamhaus.  If
your contract entitles you to all Spamhaus lists, the fix may be as
simple as changing the SWL hostname and adding your key.

> This was a recommendation I used from
> this list some time ago. Has something changed?

See above.

> postscreen_dnsbl_sites = mykey.zen.dq.spamhaus.net*3
>         b.barracudacentral.org*2
>         bl.spameatingmonkey.net*2
>         bl.spamcop.net
>         dnsbl.sorbs.net
>         psbl.surriel.com
>         bl.mailspike.net

With these 7 dnsbls you will have extreme overlap of listed IPs.  The
last 5 will gain you little to nothing and simply add latency to your
mail transactions, which is something you do not want in a high volume
environment.  I'd recommend you use Zen and BRBL, remove the rest, and
rely on SWL and dnswl for FP mitigation during SMTP.  You also run
SpamAssassin on all of these hosts, so there's no need to pile on dnsbl
queries at SMTP connect.

>         swl.spamhaus.org*-4
>         list.dnswl.org=127.[0..255].[0..255].0*-2
>         list.dnswl.org=127.[0..255].[0..255].1*-3
>         list.dnswl.org=127.[0..255].[0..255].[2..255]*-4

Consolidate these last 3 to something like:
        list.dnswl.org=127.0.[2..14].[2..3]*-4

To understand why, read "Return Codes" at:
http://dnswl.org/tech

> I'm also curious what resolvers people are using for their mail servers?
> bind? Looking at my query graphs, it appears to be about 30 queries/sec on
> average for each host, just as a local caching server.

That's ~2.6M queries/day/host.  Eliminating the 5 unnecessary dnsbl
queries will lower this considerably.  If you're not happy with bind,
check out:  http://doc.powerdns.com/html/built-in-recursor.html

If you have more than a handful of hosts doing 2.5M queries/day, you
should seriously consider building a couple of resolvers homed in
different networks and having the MX hosts query the pair.  This will
cut down considerably on the query load you're placing on your dns[b|w]l
servers, as resolver cache will be much more effective.

Cheers,

Stan
Reply | Threaded
Open this post in threaded view
|

Re: Understanding postscreen timeouts

Tom Hendrikx
In reply to this post by Alex Regan
On 05/02/2014 03:15 AM, Alex wrote:

> Hi,
>
> On Thu, May 1, 2014 at 5:38 PM, Wietse Venema <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Alex:
>     > I'm using postfix-2.10.3 with fedora20 and have configured
>     postscreen with
>     > spamhaus, barracuda, and a few other DNSBLs. I'm however occasionally
>     > receiving the following timeout message:
>     >
>     > May  1 17:15:01 mail01 postfix/postscreen[4429]: warning: dnsblog
>     reply
>     > timeout 10s for swl.spamhaus.org <http://swl.spamhaus.org>
>
>     This time limit has unfortunately escaped my attention.  It is not
>     yet configurable.
>
>     The warning message means that postscreen gives up waiting for the
>     DNS lookup result. This is a safety mechanism.
>
>     > I'm also using a half-dozen RBLs, but they don't all always timeout.
>
>     I see occasional timeouts on residential and co-located servers.
>     By default the resolver *system library* routines wait 5s before
>     retrying; this may be configurable in resolv.conf, but the
>     postscreen time limit is still hard-coded.
>
>
> These are both corporate 10mbs dedicated links and I don't think latency
> and/or bandwidth is a problem.
>
> It actually appears swl.spamhaus.org <http://swl.spamhaus.org> is the
> main problem. It doesn't even resolve when I try to do it manually. This
> was a recommendation I used from this list some time ago. Has something
> changed?
As a feed user of spamhaus, it's easy to see the amount of data that is
actually in the zones. Both DWL and SWL zones are empty, so the
whitelist experiments of spamhaus seem to be either 'on hold' or dead.
Feel free to drop the zones from your setup.

This won't fix dns lookup problems in general though.

Tom


signature.asc (902 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Understanding postscreen timeouts

Wietse Venema
In reply to this post by Stan Hoeppner
Stan Hoeppner:
> >         swl.spamhaus.org*-4
> >         list.dnswl.org=127.[0..255].[0..255].0*-2
> >         list.dnswl.org=127.[0..255].[0..255].1*-3
> >         list.dnswl.org=127.[0..255].[0..255].[2..255]*-4
>
> Consolidate these last 3 to something like:
> list.dnswl.org=127.0.[2..14].[2..3]*-4

These three will result in one list.dnswl.org query, just like the
consolidated one. There is no performance difference.

However, there is a correctness difference. The consolidated form
has the same weight 4 for all results, while the original form
has different weights.

        Wietse
Reply | Threaded
Open this post in threaded view
|

postscreen_dnsbl_timeout parameter (was: Understanding postscreen timeouts)

Wietse Venema
In reply to this post by Wietse Venema
Wietse Venema:

> Alex:
> > I'm using postfix-2.10.3 with fedora20 and have configured postscreen with
> > spamhaus, barracuda, and a few other DNSBLs. I'm however occasionally
> > receiving the following timeout message:
> >
> > May  1 17:15:01 mail01 postfix/postscreen[4429]: warning: dnsblog reply
> > timeout 10s for swl.spamhaus.org
>
> This time limit has unfortunately escaped my attention.  It is not
> yet configurable.

Fixed in Postfix 2.12.

        Wietse

20140501

        Cleanup: postcreen_dnsbl_timeout parameter. Files:
        mantools/postlink, proto/postconf.proto, global/mail_params.h,
        postscreen/postscreen.c, postscreen/postscreen_dnsbl.c.
Reply | Threaded
Open this post in threaded view
|

Re: Understanding postscreen timeouts

Stan Hoeppner
In reply to this post by Wietse Venema
On 5/2/2014 6:07 AM, Wietse Venema wrote:

> Stan Hoeppner:
>>>         swl.spamhaus.org*-4
>>>         list.dnswl.org=127.[0..255].[0..255].0*-2
>>>         list.dnswl.org=127.[0..255].[0..255].1*-3
>>>         list.dnswl.org=127.[0..255].[0..255].[2..255]*-4
>>
>> Consolidate these last 3 to something like:
>> list.dnswl.org=127.0.[2..14].[2..3]*-4
>
> These three will result in one list.dnswl.org query, just like the
> consolidated one. There is no performance difference.

Correct.  The reason for consolidating these is not to reduce queries.

> However, there is a correctness difference. The consolidated form
> has the same weight 4 for all results, while the original form
> has different weights.

The consolidated form gives no score to a 4th octet value of [0..1], but
gives -4 to [2..3].  This is the key difference.

Alex' form and weights are not correct.  And that is why I posted the
link to the return codes.  The second 'octet' is always zero, not a
range.  The 3rd octet has a range of 2-15, and the 4th octet a range of
0-3.  Specifying a range of 0-255 or 2-255 to cover "the future" may
have the opposite effect, resulting in potential disaster, depending on
how/if/when dnswl changes things.  Such wildcards should not be used.

A value of 15 in the 3rd octet means the sender is an  Email Marketing
Provider.  Most people would never whitelist such senders.  Alex
currently does.  Most people would give no preference to a 4th octet
score of 0 which means "no trust".  Alex is giving -2.  And he is giving
-3 to a 4th octet score of 1, "low trust".  The recommended scale is
-0.1, -1.0, -10, -100, and this is how SpamAssassin handles dnswl
scoring.  Using a 4 point scale instead of 100, a 4th octet value of 0
or 1 should be given NO whitelisting preference at all, which is what my
consolidated example does.

Cheers,

Stan
Reply | Threaded
Open this post in threaded view
|

Re: Understanding postscreen timeouts

Alex Regan
Hi,

On Fri, May 2, 2014 at 6:45 PM, Stan Hoeppner <[hidden email]> wrote:
On 5/2/2014 6:07 AM, Wietse Venema wrote:
> Stan Hoeppner:
>>>         swl.spamhaus.org*-4
>>>         list.dnswl.org=127.[0..255].[0..255].0*-2
>>>         list.dnswl.org=127.[0..255].[0..255].1*-3
>>>         list.dnswl.org=127.[0..255].[0..255].[2..255]*-4
>>
>> Consolidate these last 3 to something like:
>>      list.dnswl.org=127.0.[2..14].[2..3]*-4
>
> These three will result in one list.dnswl.org query, just like the
> consolidated one. There is no performance difference.

Correct.  The reason for consolidating these is not to reduce queries.

> However, there is a correctness difference. The consolidated form
> has the same weight 4 for all results, while the original form
> has different weights.

The consolidated form gives no score to a 4th octet value of [0..1], but
gives -4 to [2..3].  This is the key difference.

Alex' form and weights are not correct.  And that is why I posted the
link to the return codes.  The second 'octet' is always zero, not a
range.  The 3rd octet has a range of 2-15, and the 4th octet a range of
0-3.  Specifying a range of 0-255 or 2-255 to cover "the future" may
have the opposite effect, resulting in potential disaster, depending on
how/if/when dnswl changes things.  Such wildcards should not be used.

A value of 15 in the 3rd octet means the sender is an  Email Marketing
Provider.  Most people would never whitelist such senders.  Alex
currently does.  Most people would give no preference to a 4th octet
score of 0 which means "no trust".  Alex is giving -2.  And he is giving
-3 to a 4th octet score of 1, "low trust".  The recommended scale is
-0.1, -1.0, -10, -100, and this is how SpamAssassin handles dnswl
scoring.  Using a 4 point scale instead of 100, a 4th octet value of 0
or 1 should be given NO whitelisting preference at all, which is what my
consolidated example does.

Somehow your first message to the list on this topic didn't make it to me. Had to read it in the archives. Anyway, thanks so much. My postscreen config was generated through a discussion on this list with rob0 some time ago, as well as his postscreen config (http://rob0.nodns4.us/howto/postfix/main.cf). Perhaps if he's reading, he can correct this.

I can't believe I've been whitelisting mass mailers. That's far from what I would want to be doing. In fact, I'm considering figuring out some spamassassin rules to better identify them based on the dnswl queries.

Regarding your DNS caching comments, thanks for this too. I hadn't realized there would be bandwidth savings by having one or two DNS servers that are queried on the network versus having a local cache on each mail server. I've always been a bind loyalist, but will consider the powerDNS program if it doesn't improve.

I've already made the postscreen changes on the systems, and already noticing fewer DNS queries.

I've also removed swl.spamhaus.org entirely, thanks to a conversation with spamhaus and comments from Tom Hendrikx about it being discontinued.

Thanks everyone!
Alex


Reply | Threaded
Open this post in threaded view
|

Re: Understanding postscreen timeouts

/dev/rob0
On Fri, May 02, 2014 at 08:10:18PM -0400, Alex wrote:

> On Fri, May 2, 2014 at 6:45 PM, Stan Hoeppner
> <[hidden email]>wrote:
> > On 5/2/2014 6:07 AM, Wietse Venema wrote:
> > > Stan Hoeppner:
> > >>>         swl.spamhaus.org*-4
> > >>>         list.dnswl.org=127.[0..255].[0..255].0*-2
> > >>>         list.dnswl.org=127.[0..255].[0..255].1*-3
> > >>>         list.dnswl.org=127.[0..255].[0..255].[2..255]*-4
> > >>
> > >> Consolidate these last 3 to something like:
> > >>      list.dnswl.org=127.0.[2..14].[2..3]*-4
> > >
> > > These three will result in one list.dnswl.org query, just like
> > > the consolidated one. There is no performance difference.
> >
> > Correct.  The reason for consolidating these is not to reduce
> > queries.
> >
> > > However, there is a correctness difference. The consolidated
> > > form has the same weight 4 for all results, while the original
> > > form has different weights.
> >
> > The consolidated form gives no score to a 4th octet value of
> > [0..1], but gives -4 to [2..3].  This is the key difference.
> >
> > Alex' form and weights are not correct.  And that is why I posted
> > the link to the return codes.  The second 'octet' is always zero,
> > not a range.  The 3rd octet has a range of 2-15, and the 4th
> > octet a range of 0-3.  Specifying a range of 0-255 or 2-255 to
> > cover "the future" may have the opposite effect, resulting in
> > potential disaster, depending on how/if/when dnswl changes
> > things.  Such wildcards should not be used.

Good point. I thought of this, but did not bother to implement it
that way. Eventually I will change it.

> > A value of 15 in the 3rd octet means the sender is an Email
> > Marketing Provider.  Most people would never whitelist such
> > senders.  Alex currently does.  Most people would give no
> > preference to a 4th octet score of 0 which means "no trust".

Well, I whitelist mildly. Do note that this is a whitelist, under
management by people who, I suppose, don't like spam any more than
you nor I.

A DNSWL.org return of 127.0.15.0 means an email marketer who is
nominally trying to limit spam (thus deserving a whitelist entry),
but who might be doing that well.

A -1 score makes sense. It's not enough to override Zen nor a
grouping of other DNSBLs, but if that's the only result from
postscreen_dnsbl_sites, it's enough to bypass the after-220 checks.

> > Alex is giving -2.  And he is giving -3 to a 4th octet score of
> > 1, "low trust".  The recommended scale is -0.1, -1.0, -10, -100,
> > and this is how SpamAssassin handles dnswl scoring.

Yes, I think -1, -2 and -4 make sense. I lump 4th octet 2 and 3
together because I'm a 2. :) Also, a -4 is going to override any
borderline DNSBL score. If it doesn't, I expect something to give
somewhere. In my studies, I found very little overlap between the
DNSBLs and the DNSWLs.

> > Using a 4 point scale instead of 100, a 4th octet value of
> > 0 or 1 should be given NO whitelisting preference at all,
> > which is what my consolidated example does.

But I don't agree with that. Scoring at the content scanning stage
differs from scoring in postscreen. DNSWL.org assumes that their
trust level "none" sites are not actually making money from spam. I
can't speak for Mathias, but I am pretty sure that he would delist
ANY known spammer.

> Somehow your first message to the list on this topic didn't make it
> to me. Had to read it in the archives. Anyway, thanks so much. My
> postscreen config was generated through a discussion on this list
> with rob0 some time ago, as well as his postscreen config (
> http://rob0.nodns4.us/howto/postfix/main.cf). Perhaps if he's
> reading, he can correct this.

Hiya! Yes, I remember. BTW, the better link to share is the HTML
page, http://rob0.nodns4.us/postscreen.html , which has all the
explanations and warnings.

> I can't believe I've been whitelisting mass mailers. That's far
> from what I would want to be doing. In fact, I'm considering
> figuring out some spamassassin rules to better identify them based
> on the dnswl queries.

If you want to be adventurous (and to violate the DNSWL.org spirit)
nothing stops you from using 127.0.15.0 with a positive score in
postscreen ... or even as a reject_rbl_client in smtpd!

I figure these are at worst the gray hats. And why bother giving
delays with the after-220 tests they will pass anyway? So yes, my
policy here was considered and deliberate. But looking back, I'll
agree that a -1 would make more sense than -2.

Stan probably tends to be more aggressive than I am. There's no
right/wrong to that, it's a choice.

> Regarding your DNS caching comments, thanks for this too. I hadn't
> realized there would be bandwidth savings by having one or two DNS
> servers that are queried on the network versus having a local cache
> on each mail server. I've always been a bind loyalist, but will
> consider the powerDNS program if it doesn't improve.

I've always been a BIND loyalist too. Now I'm paid to be a BIND
loyalist. I have nothing against the competition, certainly I can't
say anything bad about them.

But I can assure you that if you know ways in which BIND needs to
improve, ISC wants to hear from you.

Bigger doesn't always mean better, this I grant (just look at
Microsoft!) But in the case of BIND it means that an enormous
worldwide userbase is assisting ISC in continually improving BIND.

I don't mind questioning my loyalties from time to time, but I
wouldn't blindly jump ship from software I know and trust unless
there was a very good reason.

> I've already made the postscreen changes on the systems, and
> already noticing fewer DNS queries.
>
> I've also removed swl.spamhaus.org entirely, thanks to a
> conversation with spamhaus and comments from Tom Hendrikx about
> it being discontinued.

Yep, I will be doing the same. Unfortunately I probably won't get
around to updating my web page very soon. Note also that I used
dnsbl.ahbl.org in postscreen; by the beginning of 2015 that will
become disastrous, as they are planning to put a wildcard in the
zone.
--
  http://rob0.nodns4.us/
  Offlist GMX mail is seen only if "/dev/rob0" is in the Subject: