Messages slow to leave active queue

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Messages slow to leave active queue

Mark Goodge
I'm hoping someone can help point me in the right direction for the
solution to this one. I'm in the process of adding a new server to an
existing mail cluster, but for some reason I'm getting really slow
throughput. The problem seems to be delays in the active queue - here's
a sample from qshape:

               T  5 10 20 40  80 160 320 640 1280 1280+
       TOTAL 253  0  0  0  0 252   1   0   0    0     0
example.com  18  0  0  0  0  18   0   0   0    0     0
example.net  16  0  0  0  0  16   0   0   0    0     0

I've snipped the stats for the rest of the domains, but the odd thing
here is that there isn't any single domain which is being delayed - all
of the domains show a similar delay profile, and there isn't more than a
handful of messages on the system for any one domain.

I've read the online documentation at
http://www.postfix.org/QSHAPE_README.htm, and this suggests that mail
building up in the active queue is caused by problems delivering it
off-system to the next recipient. But I'm pretty sure that's not the
case here - I'm not reaching the concurrency limit, and the destination
machines are accepting mail with no problems (there's no equivalent
delay when mail is relayed through one of the existing machines in the
cluster).

Here's the output from postconf -n for the new server:

command_directory = /usr/local/sbin
config_directory = /usr/local/etc/postfix
content_filter = smtp-amavis:[127.0.0.1]:10024
daemon_directory = /usr/local/libexec/postfix
data_directory = /var/db/postfix
debug_peer_level = 2
disable_vrfy_command = yes
html_directory = no
mail_owner = postfix
mailbox_size_limit = 1024000000
mailq_path = /usr/local/bin/mailq
manpage_directory = /usr/local/man
message_size_limit = 40960000
newaliases_path = /usr/local/bin/newaliases
queue_directory = /var/spool/postfix
readme_directory = no
relay_domains = $transport_maps
sample_directory = /usr/local/etc/postfix
sendmail_path = /usr/local/sbin/sendmail
setgid_group = maildrop
smtpd_client_event_limit_exceptions = 127.0.0.0/8
smtpd_discard_ehlo_keywords = silent-discard, dsn
smtpd_recipient_restrictions = check_recipient_access
mysql:/usr/local/etc/postfix/recipient_smtpd_restrictions.cf
check_recipient_access mysql:/usr/local/etc/postfix/mx_filter.cf
permit_mynetworks        reject_non_fqdn_recipient
reject_unauth_destination        check_policy_service inet:127.0.0.1:2501
smtpd_restriction_classes = reject_nullsender        reject_unwanted_sender
smtpd_sender_restrictions = reject_non_fqdn_hostname
reject_non_fqdn_sender        reject_unwanted_sender
transport_maps = mysql:/usr/local/etc/postfix/transport.cf
unknown_local_recipient_reject_code = 550
virtual_mailbox_limit = 1024000000

And, by way of comparison, this is from one of the existing cluster
(which isn't having any problems with throughout):

command_directory = /usr/local/sbin
config_directory = /usr/local/etc/postfix
content_filter = smtp-amavis:[127.0.0.1]:10024
daemon_directory = /usr/local/libexec/postfix
data_directory = /var/db/postfix
debug_peer_level = 2
disable_vrfy_command = yes
html_directory = no
mail_owner = postfix
mailbox_size_limit = 1024000000
mailq_path = /usr/local/bin/mailq
manpage_directory = /usr/local/man
message_size_limit = 40960000
newaliases_path = /usr/local/bin/newaliases
queue_directory = /var/spool/postfix
readme_directory = no
relay_domains = $transport_maps
sample_directory = /usr/local/etc/postfix
sendmail_path = /usr/local/sbin/sendmail
setgid_group = maildrop
smtpd_client_connection_count_limit = 5
smtpd_client_connection_rate_limit = 60
smtpd_client_event_limit_exceptions = 127.0.0.0/8
smtpd_discard_ehlo_keywords = silent-discard, dsn
smtpd_recipient_limit = 100
smtpd_recipient_restrictions = check_recipient_access
mysql:/usr/local/etc/postfix/recipient_smtpd_restrictions.cf
check_recipient_access mysql:/usr/local/etc/postfix/mx_filter.cf
permit_mynetworks        reject_non_fqdn_recipient
reject_unauth_destination        check_policy_service inet:127.0.0.1:2501
smtpd_restriction_classes = reject_nullsender   reject_unwanted_sender
smtpd_sender_restrictions = reject_non_fqdn_hostname
reject_non_fqdn_sender      reject_unwanted_sender
transport_maps = mysql:/usr/local/etc/postfix/transport.cf
unknown_local_recipient_reject_code = 550
virtual_mailbox_limit = 1024000000

Both machines are running Postfix version 2.5.1 with Mysql 5.0.51a. The
only other significant difference is that the older machine is running
FreeBSD 6.3, while the new one is running FreeBSD 7.0.

Apart from the different OS version and (I really, really hope that's
not the cause of the problem!), I can't see anything that could possibly
  be a reason for the difference in throughput. Unless that is, I'm
overlooking something obvious - which is why I'm asking here!

Any suggestions?

Mark
--
http://mark.goodge.co.uk - my pointless blog
http://www.good-stuff.co.uk - my less pointless stuff
Reply | Threaded
Open this post in threaded view
|

Re: Messages slow to leave active queue

Wietse Venema
Mark Goodge:
> I'm hoping someone can help point me in the right direction for the
> solution to this one. I'm in the process of adding a new server to an
> existing mail cluster, but for some reason I'm getting really slow
> throughput. The problem seems to be delays in the active queue - here's
> a sample from qshape:

Do you know if it is building up before or after the filter.

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: Messages slow to leave active queue

Noel Jones-2
In reply to this post by Mark Goodge
Mark Goodge wrote:

> I'm hoping someone can help point me in the right direction for the
> solution to this one. I'm in the process of adding a new server to an
> existing mail cluster, but for some reason I'm getting really slow
> throughput. The problem seems to be delays in the active queue - here's
> a sample from qshape:
>
>               T  5 10 20 40  80 160 320 640 1280 1280+
>       TOTAL 253  0  0  0  0 252   1   0   0    0     0
> example.com  18  0  0  0  0  18   0   0   0    0     0
> example.net  16  0  0  0  0  16   0   0   0    0     0

I strongly suspect that your amavisd-new is the bottleneck.
Examine your logs and the output of the amavisd-nanny program.

Amavisd will log timing breakdowns at log_level 2 or higher;
that can be useful to see where it's spending its time.  Slow
DNS can cause SpamAssassin to take longer than expected to
process mail.

amavisd-nanny will show how many amavisd-new threads are
available and how many are being used - maybe you don't have
enough configured or they aren't all being used for some reason.

I believe there was a patch for qshape posted a while back
that changes it to show mail waiting for the content_filter,
rather than the final destination.  Search the archives...

Or maybe just a boring old networking problem, such as a
duplex mismatch or flaky cable, that's dramatically slowing
down transmission.

Pick one message and follow the logs for that message from
submission, through the content_filter, to final delivery.
That should help narrow it down.

--
Noel Jones
Reply | Threaded
Open this post in threaded view
|

Re: Messages slow to leave active queue

Mark Goodge


Noel Jones wrote:

> Mark Goodge wrote:
>> I'm hoping someone can help point me in the right direction for the
>> solution to this one. I'm in the process of adding a new server to an
>> existing mail cluster, but for some reason I'm getting really slow
>> throughput. The problem seems to be delays in the active queue -
>> here's a sample from qshape:
>>
>>               T  5 10 20 40  80 160 320 640 1280 1280+
>>       TOTAL 253  0  0  0  0 252   1   0   0    0     0
>> example.com  18  0  0  0  0  18   0   0   0    0     0
>> example.net  16  0  0  0  0  16   0   0   0    0     0
>
> I strongly suspect that your amavisd-new is the bottleneck.
> Examine your logs and the output of the amavisd-nanny program.

Well, disabling the content filter does improve performance, but it
still isn't as good as it ought to be. So I still have a little bit of a
mystery.

> Amavisd will log timing breakdowns at log_level 2 or higher; that can be
> useful to see where it's spending its time.  Slow DNS can cause
> SpamAssassin to take longer than expected to process mail.
>
> amavisd-nanny will show how many amavisd-new threads are available and
> how many are being used - maybe you don't have enough configured or they
> aren't all being used for some reason.
>
> I believe there was a patch for qshape posted a while back that changes
> it to show mail waiting for the content_filter, rather than the final
> destination.  Search the archives...

Ah, now that could be very useful. Thanks for the suggestion.

> Or maybe just a boring old networking problem, such as a duplex mismatch
> or flaky cable, that's dramatically slowing down transmission.

Trust me, we've already tried checking every possible bit of hardware in
the chain ;-)

Mark
--
http://mark.goodge.co.uk - my pointless blog
http://www.good-stuff.co.uk - my less pointless stuff