Huge active queue and system idle, not delivering

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
39 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Huge active queue and system idle, not delivering

Patrick Chemla
Hi,

I am running Postfix 2.5.6 on a Fedora 11 Linux system on a hardware based Intel  I5/750 Quad Core, 8 Gb memory, 160Gb SSD hard disk.

Incoming messages are entering very fast (500 smtp processes declared) and the active queue is actually of 2 millions messages waiting for delivery.

The delivery, for all messages should go through a farm of 30 MX servers from domain localpc2105.com, on load balancing through DNS resolution. DNS server is of course local. All 30 MX servers are running qmail. All of them are more than 90% idle. Before I set up my postfix server, email were sent directly to the qmail servers, and qmail was running at full CPU. So I am sure that qmail can handle much more faster. I have set up the postfix server to load balance the load between all the 30 qmail servers to avoid situation where some were running at full charge and others were not working.

Postfix server, 30 Qmail servers, DNS are on the same 1Gb LAN.

I don't need any messages or clients or recipients control at this stage. No anti spam, no anti virus. All this is already done by qmail servers, despite my plan is to replace all qmails by postfix.

CPU is more than 85% idle on my postfix I5/750 box, but the outbound queue is very very slow.

It seems that something refrain qmgr to work at full range, despite the parameters

here is my main.cf file:

queue_directory = /var/spool/postfix
command_directory = /usr/sbin
daemon_directory = /usr/libexec/postfix
data_directory = /var/lib/postfix
mail_owner = postfix
myhostname = postfix.proacti5.net
mydomain = localpc2105.com
inet_interfaces = all
mydestination = $myhostname, localhost.$mydomain, localhost
unknown_local_recipient_reject_code = 550
mynetworks = 172.27.27.0/24, 10.0.0.0/24, 127.0.0.0/24
relayhost = $mydomain
alias_maps = hash:/etc/aliases
alias_database = hash:/etc/aliases
local_destination_recipient_limit = 500
local_destination_concurrency_limit = 50
debug_peer_level = 8
debug_peer_list = orange.fr
debugger_command =
     PATH=/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin
     ddd $daemon_directory/$process_name $process_id & sleep 5
sendmail_path = /usr/sbin/sendmail.postfix
newaliases_path = /usr/bin/newaliases.postfix
mailq_path = /usr/bin/mailq.postfix
setgid_group = postdrop
html_directory = no
manpage_directory = /usr/share/man
sample_directory = /usr/share/doc/postfix-2.5.6/samples
readme_directory = /usr/share/doc/postfix-2.5.6/README_FILES
inet_protocols = all
default_process_limit = 1000
initial_destination_concurrency = 100
transport_initial_destination_concurrency = 100
default_destination_concurrency_failed_cohort_limit = 10
default_destination_recipient_limit = 200
transport_destination_recipient_limit = 100
default_delivery_slot_cost = 30
default_minimum_delivery_slots = 30
default_delivery_slot_discount = 100
smtpd_peername_lookup = no
default_recipient_limit = 2000000
mailbox_size_limit = 5120000000
qmgr_message_active_limit = 2000000
qmgr_message_recipient_limit = 2000000
default_destination_concurrency_limit = 500
lmtp_destination_concurrency_limit = $default_destination_concurrency_limit
relay_destination_concurrency_limit = $default_destination_concurrency_limit
smtp_destination_concurrency_limit = $default_destination_concurrency_limit
max_use = 1000
mime_nesting_limit = 100
qmgr_fudge_factor = 200
queue_file_attribute_count_limit = 250
smtpd_history_flush_threshold = 100
smtpd_junk_command_limit = 100
smtp_connect_timeout = 10s
smtp_data_done_timeout = 10s
smtp_mail_timeout = 5s


Here is my master.cf file:


service type private
(yes)
unpriv
(yes)
chroot
(yes)
wakeup
(never)
maxproc
(100)
command + args
smtp inet n - n - - smtpd
pickup fifo n - n 60 1 pickup
cleanup unix n - n - 0 cleanup
qmgr fifo n - n 30 1 qmgr
tlsmgr unix - - n 1000? 1 tlsmgr
rewrite unix - - n - - trivial-rewrite
bounce unix - - n - 0 bounce
defer unix - - n - 0 bounce
trace unix - - n - 0 bounce
verify unix - - n - 1 verify
flush unix n - n 1000? 0 flush
proxymap unix - - n - - proxymap
proxywrite unix - - n - 1 proxymap
smtp unix - - n - - smtp
relay unix
-o smtp_fallback_relay=
- - n - - smtp
showq unix n - n - - showq
error unix - - n - - error
retry unix - - n - - error
discard unix - - n - - discard
local unix - n n - - local
virtual unix - n n - - virtual
lmtp unix - - n - - lmtp
anvil unix - - n - 1 anvil
scache unix - - n - 1 scache

I tried many combinations to speed up the delivery. Nothing help up to now.

I have look up in tons of threads in many places but haven't found a real solution.

I just found that Postfix could send 1 million emails per hour when I send less than a half million in 24 hours.

Please help
Patrick

Reply | Threaded
Open this post in threaded view
|

Re: Huge active queue and system idle, not delivering

Wietse Venema
Patrick Chemla:

> Hi,
>
> I am running Postfix 2.5.6 on a Fedora 11 Linux system on a hardware
> based Intel  I5/750 Quad Core, 8 Gb memory, 160Gb SSD hard disk.
>
> Incoming messages are entering very fast (500 smtp processes declared)
> and the active queue is actually of 2 millions messages waiting for
> delivery.
>
> The delivery, for all messages should go through a farm of 30 MX servers
> from domain localpc2105.com, on load balancing through DNS resolution.
> DNS server is of course local. All 30 MX servers are running qmail. All
> of them are more than 90% idle. Before I set up my postfix server, email
> were sent directly to the qmail servers, and qmail was running at full
> CPU. So I am sure that qmail can handle much more faster. I have set up
> the postfix server to load balance the load between all the 30 qmail
> servers to avoid situation where some were running at full charge and
> others were not working.

http://www.postfix.org/DEBUG_README.html#logging

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: Huge active queue and system idle, not delivering

Barney Desmond
In reply to this post by Patrick Chemla
2010/1/8 Patrick Chemla <[hidden email]>
> Incoming messages are entering very fast (500 smtp processes declared) and
> the active queue is actually of 2 millions messages waiting for delivery.
> <snip>
> here is my main.cf file:

That's some very thorough information, you've provided plenty of
context and clear description, which is great. While I lack sufficient
knowledge to provide thoughts on the bottlenecking, I *do* expect that
people will want to see the output of `postconf -n`, instead of your
main.cf (to ensure we see what postfix actually sees and uses).

Can you clarify what you mean by "500 smtp processes declared"? A
sample output from qshape also wouldn't go astray either
(http://www.postfix.org/qshape.1.html). You're provided some
proportional figures (percentages), but some solid throughput numbers
would be good too. Eg. "We're injecting 2 million messages to the
postfix box, we expect to enqueue them in X hrs, but it takes Y hrs,
and they're only leaving the postfix box at Z messages/sec". I see you
said "I just found that Postfix could send 1 million emails per hour
when I send less than a half million in 24 hours", but I can't make
sense of that, sorry.
Reply | Threaded
Open this post in threaded view
|

Re: Huge active queue and system idle, not delivering

Patrick Chemla
Le 07/01/2010 20:03, Barney Desmond a écrit :

> 2010/1/8 Patrick Chemla<[hidden email]>
>    
>> Incoming messages are entering very fast (500 smtp processes declared) and
>> the active queue is actually of 2 millions messages waiting for delivery.
>> <snip>
>> here is my main.cf file:
>>      
> That's some very thorough information, you've provided plenty of
> context and clear description, which is great. While I lack sufficient
> knowledge to provide thoughts on the bottlenecking, I *do* expect that
> people will want to see the output of `postconf -n`, instead of your
> main.cf (to ensure we see what postfix actually sees and uses).
>
>    
Here is postconf -n
alias_database = hash:/etc/aliases
alias_maps = hash:/etc/aliases
command_directory = /usr/sbin
config_directory = /etc/postfix
daemon_directory = /usr/libexec/postfix
data_directory = /var/lib/postfix
debug_peer_level = 8
debug_peer_list = orange.fr
default_delivery_slot_cost = 30
default_delivery_slot_discount = 100
default_destination_concurrency_failed_cohort_limit = 10
default_destination_concurrency_limit = 500
default_destination_recipient_limit = 200
default_minimum_delivery_slots = 30
default_process_limit = 1000
default_recipient_limit = 2000000
html_directory = no
inet_interfaces = all
inet_protocols = all
initial_destination_concurrency = 100
lmtp_destination_concurrency_limit = $default_destination_concurrency_limit
local_destination_concurrency_limit = 50
local_destination_recipient_limit = 500
mail_owner = postfix
mailbox_size_limit = 5120000000
mailq_path = /usr/bin/mailq.postfix
manpage_directory = /usr/share/man
max_use = 1000
mime_nesting_limit = 100
mydestination = $myhostname, localhost.$mydomain, localhost
mydomain = localpc2105.com
myhostname = postfix.proacti5.net
mynetworks = 172.27.27.0/24, 10.0.0.0/24, 127.0.0.0/24
newaliases_path = /usr/bin/newaliases.postfix
qmgr_fudge_factor = 200
qmgr_message_active_limit = 2000000
qmgr_message_recipient_limit = 2000000
queue_directory = /var/spool/postfix
queue_file_attribute_count_limit = 250
readme_directory = /usr/share/doc/postfix-2.5.6/README_FILES
relay_destination_concurrency_limit = $default_destination_concurrency_limit
relayhost = $mydomain
sample_directory = /usr/share/doc/postfix-2.5.6/samples
sendmail_path = /usr/sbin/sendmail.postfix
setgid_group = postdrop
smtp_connect_timeout = 10s
smtp_data_done_timeout = 10s
smtp_destination_concurrency_limit = $default_destination_concurrency_limit
smtp_mail_timeout = 5s
smtpd_history_flush_threshold = 100
smtpd_junk_command_limit = 100
smtpd_peername_lookup = no
unknown_local_recipient_reject_code = 550


> Can you clarify what you mean by "500 smtp processes declared"? A
> sample output from qshape also wouldn't go astray either
> (http://www.postfix.org/qshape.1.html).
Here is qshape:                      T  5 10 20 40 80  160   320   640
1280 1280+
                          TOTAL 133000  0  0  0  0  0 2470 40538 80844
7167  1981
                     wanadoo.fr  61955  0  0  0  0  0 2469 26830 31340
1260    56
                      orange.fr   4171  0  0  0  0  0    0  1176  2144  
540   311
                      skynet.be   3286  0  0  0  0  0    0     1  
3284    0     1
                   aliceadsl.fr   3259  0  0  0  0  0    0    54  3169  
25    11
                        aol.com   3150  0  0  0  0  0    0  1545  1524  
40    41
                        free.fr   2138  0  0  0  0  0    0   453  1561  
89    35
                         sfr.fr    840  0  0  0  0  0    0    23  
816    1     0
                     hotmail.fr    679  0  0  0  0  0    0   150   420  
12    97
                     telenet.be    658  0  0  0  0  0    0     0  
658    0     0
                      gmail.com    358  0  0  0  0  0    0   157   145  
11    45
                    hotmail.com    325  0  0  0  0  0    0    44   220  
20    41
                        neuf.fr    252  0  0  0  0  0    0    62   176  
14     0
                     9online.fr    250  0  0  0  0  0    0     6  
244    0     0
                    cegetel.net    195  0  0  0  0  0    0    26  
155    4    10
                    laposte.net    183  0  0  0  0  0    0    51    93  
15    24
                       swing.be    141  0  0  0  0  0    0     2  
139    0     0
                   9business.fr    111  0  0  0  0  0    0    23    
85    3     0
                     sonepar.fr    107  0  0  0  0  0    0    33    
72    2     0
                         axa.fr    103  0  0  0  0  0    0    30    
67    1     5

most of the messages stay in the queue for hours.


> You're provided some
> proportional figures (percentages), but some solid throughput numbers
> would be good too. Eg. "We're injecting 2 million messages to the
> postfix box, we expect to enqueue them in X hrs, but it takes Y hrs,
> and they're only leaving the postfix box at Z messages/sec". I see you
> said "I just found that Postfix could send 1 million emails per hour
> when I send less than a half million in 24 hours", but I can't make
> sense of that, sorry.
>    
I have to inject 2 to 4 millions emails to the postfix box in 24 hours,
and I expect to deliver within the same delay.
Actually, I can't deliver more than 500,000 per 24h hours.

But the CPU of the box is idle more than 80%. It is clear that it is not
a matter of CPU, nor memory, nor disk. Something in the number of
processes/users/simultaneous tasks is blocking.



Reply | Threaded
Open this post in threaded view
|

Re: Huge active queue and system idle, not delivering

Patrick Chemla
In reply to this post by Wietse Venema
Le 07/01/2010 20:00, Wietse Venema a écrit :

> Patrick Chemla:
>    
>> Hi,
>>
>> I am running Postfix 2.5.6 on a Fedora 11 Linux system on a hardware
>> based Intel  I5/750 Quad Core, 8 Gb memory, 160Gb SSD hard disk.
>>
>> Incoming messages are entering very fast (500 smtp processes declared)
>> and the active queue is actually of 2 millions messages waiting for
>> delivery.
>>
>> The delivery, for all messages should go through a farm of 30 MX servers
>> from domain localpc2105.com, on load balancing through DNS resolution.
>> DNS server is of course local. All 30 MX servers are running qmail. All
>> of them are more than 90% idle. Before I set up my postfix server, email
>> were sent directly to the qmail servers, and qmail was running at full
>> CPU. So I am sure that qmail can handle much more faster. I have set up
>> the postfix server to load balance the load between all the 30 qmail
>> servers to avoid situation where some were running at full charge and
>> others were not working.
>>      
> http://www.postfix.org/DEBUG_README.html#logging
>
> Wietse
>    

Here the logs:

Jan  6 23:12:48 postfix postfix/qmgr[31260]: warning: to turn off these
warnings specify: qmgr_clog_warn_time = 0
Jan  6 23:19:39 postfix postfix/qmgr[31260]: warning: mail for
localpc2105.com is using up 461335 of 461335 active queue entries
Jan  6 23:19:39 postfix postfix/qmgr[31260]: warning: you may need to
increase the main.cf smtp_destination_concurrency_limit from 100
Jan  6 23:19:39 postfix postfix/qmgr[31260]: warning: please avoid
flushing the whole queue when you have
Jan  6 23:19:39 postfix postfix/qmgr[31260]: warning: lots of deferred
mail, that is bad for performance
Jan  6 23:19:39 postfix postfix/qmgr[31260]: warning: to turn off these
warnings specify: qmgr_clog_warn_time = 0
Jan  6 23:24:51 postfix postfix/qmgr[31260]: warning: mail for
localpc2105.com is using up 461086 of 461086 active queue entries
Jan  6 23:24:51 postfix postfix/qmgr[31260]: warning: you may need to
increase the main.cf smtp_destination_concurrency_limit from 100
Jan  6 23:24:51 postfix postfix/qmgr[31260]: warning: please avoid
flushing the whole queue when you have
Jan  6 23:24:51 postfix postfix/qmgr[31260]: warning: lots of deferred
mail, that is bad for performance
Jan  6 23:24:51 postfix postfix/qmgr[31260]: warning: to turn off these
warnings specify: qmgr_clog_warn_time = 0
Jan  6 23:29:51 postfix postfix/qmgr[31260]: warning: mail for
localpc2105.com is using up 460872 of 460872 active queue entries
Jan  6 23:29:51 postfix postfix/qmgr[31260]: warning: you may need to
increase the main.cf smtp_destination_concurrency_limit from 100
Jan  6 23:29:51 postfix postfix/qmgr[31260]: warning: please avoid
flushing the whole queue when you have
Jan  6 23:29:51 postfix postfix/qmgr[31260]: warning: lots of deferred
mail, that is bad for performance
Jan  6 23:29:51 postfix postfix/qmgr[31260]: warning: to turn off these
warnings specify: qmgr_clog_warn_time = 0
Jan  6 23:35:51 postfix postfix/qmgr[31260]: warning: mail for
localpc2105.com is using up 460025 of 460025 active queue entries
Jan  6 23:35:51 postfix postfix/qmgr[31260]: warning: you may need to
increase the main.cf smtp_destination_concurrency_limit from 100
Jan  6 23:35:51 postfix postfix/qmgr[31260]: warning: please avoid
flushing the whole queue when you have
Jan  6 23:35:51 postfix postfix/qmgr[31260]: warning: lots of deferred
mail, that is bad for performance
Jan  6 23:35:51 postfix postfix/qmgr[31260]: warning: to turn off these
warnings specify: qmgr_clog_warn_time = 0
Jan  6 23:40:51 postfix postfix/qmgr[31260]: warning: mail for
localpc2105.com is using up 460283 of 460283 active queue entries
Jan  6 23:40:51 postfix postfix/qmgr[31260]: warning: you may need to
increase the main.cf smtp_destination_concurrency_limit from 100
Jan  6 23:40:51 postfix postfix/qmgr[31260]: warning: please avoid
flushing the whole queue when you have
Jan  6 23:40:51 postfix postfix/qmgr[31260]: warning: lots of deferred
mail, that is bad for performance
Jan  6 23:40:51 postfix postfix/qmgr[31260]: warning: to turn off these
warnings specify: qmgr_clog_warn_time = 0
Jan  6 23:47:21 postfix postfix/qmgr[31260]: warning: mail for
localpc2105.com is using up 459714 of 459714 active queue entries
Jan  6 23:47:21 postfix postfix/qmgr[31260]: warning: you may need to
increase the main.cf smtp_destination_concurrency_limit from 100
Jan  6 23:47:21 postfix postfix/qmgr[31260]: warning: please avoid
flushing the whole queue when you have
Jan  6 23:47:21 postfix postfix/qmgr[31260]: warning: lots of deferred
mail, that is bad for performance
Jan  6 23:47:21 postfix postfix/qmgr[31260]: warning: to turn off these
warnings specify: qmgr_clog_warn_time = 0
Jan  6 23:52:21 postfix postfix/qmgr[31260]: warning: mail for
localpc2105.com is using up 459491 of 459491 active queue entries
Jan  6 23:52:21 postfix postfix/qmgr[31260]: warning: you may need to
increase the main.cf smtp_destination_concurrency_limit from 100
Jan  6 23:52:21 postfix postfix/qmgr[31260]: warning: please avoid
flushing the whole queue when you have
Jan  6 23:52:21 postfix postfix/qmgr[31260]: warning: lots of deferred
mail, that is bad for performance
Jan  6 23:52:21 postfix postfix/qmgr[31260]: warning: to turn off these
warnings specify: qmgr_clog_warn_time = 0


Patrick

Reply | Threaded
Open this post in threaded view
|

Re: Huge active queue and system idle, not delivering

Victor Duchovni
In reply to this post by Patrick Chemla
On Thu, Jan 07, 2010 at 07:43:55PM +0200, Patrick Chemla wrote:

> CPU is more than 85% idle on my postfix I5/750 box, but the outbound queue
> is very very slow.

    Throughput == Concurrency / Latency

What destination are most of the messages in the queue going to?

What is the associated transport?

Are you using any content filters?

What is the destination concurrency limit for that transport?

What is the delivery latency to that transport? Show a/b/c/d data
averaged (mean, median, stddev) over a bunch of log entries.

> It seems that something refrain qmgr to work at full range, despite the
> parameters

Most of your parameter tweaks are counter-productive. Do not tweak
anything other than the destination concurrency limit for a transport
that delivers to a high capacity destination you control, say:

        # The 200 is not a golden value, start at 50 and raise only
        # if throughput improves as a result...
        #
        relay_destination_concurrency_limit = 200

> myhostname = postfix.proacti5.net
> mydomain = localpc2105.com
> inet_interfaces = all
> mydestination = $myhostname, localhost.$mydomain, localhost
> unknown_local_recipient_reject_code = 550
> mynetworks = 172.27.27.0/24, 10.0.0.0/24, 127.0.0.0/24
> relayhost = $mydomain

With relayhost set, all remote mail goes to the MX hosts for $mydomain,
so in this case, you can also raise:

        # The 200 is not a golden value, start at 50 and raise only
        # if throughput improves as a result...
        #
        smtp_destination_concurrency_limit = 200

if necessary.

> local_destination_recipient_limit = 500

Terrible idea.

> local_destination_concurrency_limit = 50

Terrible idea.

> debug_peer_level = 8

Absurd.

> debug_peer_list = orange.fr

I hope very little mail goes there...

> default_process_limit = 1000

Raise just the master.cf limits for the smtpd(8) and smtp(8)
services. You don't need 1000 of everything.

> initial_destination_concurrency = 100

Too high.

> transport_initial_destination_concurrency = 100

You misunderstood the docs, this is useless.

> default_destination_concurrency_failed_cohort_limit = 10

Should not be necessary.

> default_destination_recipient_limit = 200

OK.

> transport_destination_recipient_limit = 100

You misunderstood the docs, this is useless.

> default_delivery_slot_cost = 30
> default_minimum_delivery_slots = 30
> default_delivery_slot_discount = 100
> qmgr_fudge_factor = 200

Don't mess with the nqmgr tunables, they are too subtle for mortals.

> smtpd_peername_lookup = no

When output is starved, why make the input even faster?

> default_recipient_limit = 2000000
> qmgr_message_active_limit = 2000000
> qmgr_message_recipient_limit = 2000000

The Postfix queue does not scale to arbitrarily large sizes,
at some point, there is more to do than available capacity to
process the backlog. 2 million active messages may be OK for
a mass-mail engine that fires up periodically, and works as fast
as it can, but it is terrible for a mail forwarding relay. Which
use-case are you in?

> mailbox_size_limit = 5120000000

Why does this machine have any mailboxes at all? Isn't it a relay?
What software performs well with 5GB mailboxes?

> default_destination_concurrency_limit = 500

Better to specify "smtp", "relay" or both, but not "default".

> lmtp_destination_concurrency_limit = $default_destination_concurrency_limit
> smtp_destination_concurrency_limit = $default_destination_concurrency_limit
> relay_destination_concurrency_limit = $default_destination_concurrency_limit
> mime_nesting_limit = 100

These are default settings, don't add them to main.cf

> max_use = 1000

Fine.

> queue_file_attribute_count_limit = 250
> smtpd_history_flush_threshold = 100

Why???

> smtpd_junk_command_limit = 100

Why so generous to the input side?

> smtp_connect_timeout = 10s

Reasonable for a large nearby  MX pool, you can even use 1s if you want.

> smtp_data_done_timeout = 10s

Really not a good idea.

> smtp_mail_timeout = 5s

A bit aggressive...

> smtp inet n - n - - smtpd

Tune the process limit here

> qmgr fifo n - n 30 1 qmgr

Why re-scan the incoming queue every 30 seconds? The default is fine.

> smtp unix - - n - - smtp

Adjust the process limit here to the right number of smtp(8)
delivery agents.

> relay unix
> -o smtp_fallback_relay= - - n - - smtp

Adjust this process limit if you service any relay domains.

> I tried many combinations to speed up the delivery. Nothing help up to now.

LOGS!!!

> I just found that Postfix could send 1 million emails per hour when I send
> less than a half million in 24 hours.

LOGS!!!

--
        Viktor.

Disclaimer: off-list followups get on-list replies or get ignored.
Please do not ignore the "Reply-To" header.

To unsubscribe from the postfix-users list, visit
http://www.postfix.org/lists.html or click the link below:
<mailto:[hidden email]?body=unsubscribe%20postfix-users>

If my response solves your problem, the best way to thank me is to not
send an "it worked, thanks" follow-up. If you must respond, please put
"It worked, thanks" in the "Subject" so I can delete these quickly.
Reply | Threaded
Open this post in threaded view
|

Re: Huge active queue and system idle, not delivering

Victor Duchovni
In reply to this post by Patrick Chemla
On Thu, Jan 07, 2010 at 08:29:44PM +0200, Patrick Chemla wrote:

> Here the logs:

This is just the qmgr(8) warnings about a clogged queue. Other than
telling us that all the mail is going to "localpc2105.com", this
is not very useful. Where are the logs from smtp(8)?

What transport is "localpc2105.com" destined for? Any earlier
logging about actual delivery attempts for this destination?

--
        Viktor.

Disclaimer: off-list followups get on-list replies or get ignored.
Please do not ignore the "Reply-To" header.

To unsubscribe from the postfix-users list, visit
http://www.postfix.org/lists.html or click the link below:
<mailto:[hidden email]?body=unsubscribe%20postfix-users>

If my response solves your problem, the best way to thank me is to not
send an "it worked, thanks" follow-up. If you must respond, please put
"It worked, thanks" in the "Subject" so I can delete these quickly.
Reply | Threaded
Open this post in threaded view
|

Re: Huge active queue and system idle, not delivering

Stef Caunter
In reply to this post by Patrick Chemla
On Thu, Jan 7, 2010 at 1:25 PM, Patrick Chemla
<[hidden email]> wrote:

>> said "I just found that Postfix could send 1 million emails per hour
>> when I send less than a half million in 24 hours", but I can't make
>> sense of that, sorry.
>>
>
> I have to inject 2 to 4 millions emails to the postfix box in 24 hours, and
> I expect to deliver within the same delay.
> Actually, I can't deliver more than 500,000 per 24h hours.

It could be viewed that half a million delivered in 24 hours is fine.
Are you signing the mail? This can help with delivery rates to the
large webmailer mx destinations.

Stef

> But the CPU of the box is idle more than 80%. It is clear that it is not a
> matter of CPU, nor memory, nor disk. Something in the number of
> processes/users/simultaneous tasks is blocking.
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Huge active queue and system idle, not delivering

Victor Duchovni
On Thu, Jan 07, 2010 at 04:47:14PM -0500, Stefan Caunter wrote:

> >
> > I have to inject 2 to 4 millions emails to the postfix box in 24 hours, and
> > I expect to deliver within the same delay.
> > Actually, I can't deliver more than 500,000 per 24h hours.
>
> It could be viewed that half a million delivered in 24 hours is fine.

No, it is too slow, when there is no content inspection involved,
especially with a nearby farm of relayhosts.

> Are you signing the mail? This can help with delivery rates to the
> large webmailer mx destinations.

This is unrelated to the OP's problem.

--
        Viktor.

Disclaimer: off-list followups get on-list replies or get ignored.
Please do not ignore the "Reply-To" header.

To unsubscribe from the postfix-users list, visit
http://www.postfix.org/lists.html or click the link below:
<mailto:[hidden email]?body=unsubscribe%20postfix-users>

If my response solves your problem, the best way to thank me is to not
send an "it worked, thanks" follow-up. If you must respond, please put
"It worked, thanks" in the "Subject" so I can delete these quickly.
Reply | Threaded
Open this post in threaded view
|

Re: Huge active queue and system idle, not delivering

Ralf Hildebrandt
In reply to this post by Stef Caunter
* Stefan Caunter <[hidden email]>:

> It could be viewed that half a million delivered in 24 hours is fine.
> Are you signing the mail? This can help with delivery rates to the
> large webmailer mx destinations.

There are many things to consider:

* DKIM signing - which is the prerequisite for getting into feedback
  loops at major email providers
* get into the feedback loops at major email providers
* SPF
* good reputation (e.g. SenderBase, senderscore)

--
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  [hidden email] | http://www.charite.de
           
Reply | Threaded
Open this post in threaded view
|

Re: Huge active queue and system idle, not delivering

Victor Duchovni
On Thu, Jan 07, 2010 at 10:54:15PM +0100, Ralf Hildebrandt wrote:

> > It could be viewed that half a million delivered in 24 hours is fine.
> > Are you signing the mail? This can help with delivery rates to the
> > large webmailer mx destinations.
>
> There are many things to consider:
>
> * DKIM signing - which is the prerequisite for getting into feedback
>   loops at major email providers
> * get into the feedback loops at major email providers
> * SPF
> * good reputation (e.g. SenderBase, senderscore)

None of these apply to the OP's problem. He is sending mail to a pool
of 30 qmail hosts.

--
        Viktor.

Disclaimer: off-list followups get on-list replies or get ignored.
Please do not ignore the "Reply-To" header.

To unsubscribe from the postfix-users list, visit
http://www.postfix.org/lists.html or click the link below:
<mailto:[hidden email]?body=unsubscribe%20postfix-users>

If my response solves your problem, the best way to thank me is to not
send an "it worked, thanks" follow-up. If you must respond, please put
"It worked, thanks" in the "Subject" so I can delete these quickly.
Reply | Threaded
Open this post in threaded view
|

Re: Huge active queue and system idle, not delivering

Patrick Chemla
In reply to this post by Stef Caunter
Le 07/01/2010 23:47, Stefan Caunter a écrit :

> On Thu, Jan 7, 2010 at 1:25 PM, Patrick Chemla
> <[hidden email]>  wrote:
>
>    
>>> said "I just found that Postfix could send 1 million emails per hour
>>> when I send less than a half million in 24 hours", but I can't make
>>> sense of that, sorry.
>>>
>>>        
>> I have to inject 2 to 4 millions emails to the postfix box in 24 hours, and
>> I expect to deliver within the same delay.
>> Actually, I can't deliver more than 500,000 per 24h hours.
>>      
> It could be viewed that half a million delivered in 24 hours is fine.
> Are you signing the mail? This can help with delivery rates to the
> large webmailer mx destinations.
>
> Stef
>
>    

Half a million is 4 times lower than what we have done with qmail
servers. Email are signed, but not from Postfix. Postfix must only relay
mails from clients to local MXs. These local MXs will assume deliveries
to the outside. Mail queue should be on these MXs, because they are
dependant on final destinations.
>> But the CPU of the box is idle more than 80%. It is clear that it is not a
>> matter of CPU, nor memory, nor disk. Something in the number of
>> processes/users/simultaneous tasks is blocking.
>>
>>
>>
>>
>>      

Reply | Threaded
Open this post in threaded view
|

Re: Huge active queue and system idle, not delivering

Patrick Chemla
In reply to this post by Victor Duchovni
Le 07/01/2010 20:37, Victor Duchovni a écrit :

> On Thu, Jan 07, 2010 at 08:29:44PM +0200, Patrick Chemla wrote:
>
>    
>> Here the logs:
>>      
> This is just the qmgr(8) warnings about a clogged queue. Other than
> telling us that all the mail is going to "localpc2105.com", this
> is not very useful. Where are the logs from smtp(8)?
>
> What transport is "localpc2105.com" destined for? Any earlier
> logging about actual delivery attempts for this destination?
>
>    

Victor, thank you for your interest.

Daily logs are huge.

Here is a sample of deliveries:
Jan  7 22:02:57 postfix postfix/qmgr[26441]: 5B91F873F6: removed
Jan  7 22:02:57 postfix postfix/smtp[27180]: 375DDD5923:
to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25,
conn_use=59, delay=61550, delays=17019/44435/96/0.17, dsn=2.0.0,
status=sent (250 ok 1262894577 qp 12113)
Jan  7 22:02:57 postfix postfix/qmgr[26441]: 375DDD5923: removed
Jan  7 22:02:58 postfix postfix/smtp[27070]: 7F0F2943B3:
to=<[hidden email]>, relay=a70.localpc2105.com[10.0.0.70]:25,
conn_use=10, delay=73795, delays=29264/44481/50/0.21, dsn=2.0.0,
status=sent (250 ok 1262894577 qp 23067)
Jan  7 22:02:58 postfix postfix/qmgr[26441]: 7F0F2943B3: removed
Jan  7 22:02:58 postfix postfix/smtp[27050]: 32BB182182:
to=<[hidden email]>,
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=48, delay=73799,
delays=29268/44466/65/0.28, dsn=2.0.0, status=sent (250 ok 1262894578 qp
12121)
Jan  7 22:02:58 postfix postfix/qmgr[26441]: 32BB182182: removed
Jan  7 22:02:58 postfix postfix/smtp[26758]: 577D6C7F7D:
to=<[hidden email]>,
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=60, delay=68451,
delays=23920/44481/50/0.29, dsn=2.0.0, status=sent (250 ok 1262894578 qp
12122)
Jan  7 22:02:58 postfix postfix/qmgr[26441]: 577D6C7F7D: removed
Jan  7 22:02:58 postfix postfix/smtp[26935]: CDCE074F53:
to=<[hidden email]>,
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=49, delay=104597,
delays=60065/44421/110/0.3, dsn=2.0.0, status=sent (250 ok 1262894578 qp
12135)
Jan  7 22:02:58 postfix postfix/qmgr[26441]: CDCE074F53: removed
Jan  7 22:02:58 postfix postfix/smtp[26708]: 4B0B6E77FD:
to=<[hidden email]>,
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=61, delay=46137,
delays=1606/44461/70/0.31, dsn=2.0.0, status=sent (250 ok 1262894578 qp
12136)
Jan  7 22:02:58 postfix postfix/qmgr[26441]: 4B0B6E77FD: removed
Jan  7 22:02:58 postfix postfix/smtp[26794]: D2CB5DC84C:
to=<[hidden email]>,
relay=a70.localpc2105.com[10.0.0.70]:25, conn_use=11, delay=58160,
delays=13628/44481/50/0.23, dsn=2.0.0, status=sent (250 ok 1262894578 qp
23076)
Jan  7 22:02:58 postfix postfix/qmgr[26441]: D2CB5DC84C: removed
Jan  7 22:02:58 postfix postfix/smtp[26968]: 1A651E17E0:
to=<[hidden email]>, relay=a74.localpc2105.com[10.0.0.74]:25,
conn_use=2, delay=54426, delays=9894/44462/69/0.27, dsn=2.0.0,
status=sent (250 ok 1262894578 qp 7411)
Jan  7 22:02:58 postfix postfix/qmgr[26441]: 1A651E17E0: removed
Jan  7 22:02:58 postfix postfix/smtp[27037]: 4CCC486B55:
to=<[hidden email]>,
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=50, delay=45538,
delays=1005/44407/125/0.17, dsn=2.0.0, status=sent (250 ok 1262894578 qp
12150)
Jan  7 22:02:58 postfix postfix/qmgr[26441]: 4CCC486B55: removed
Jan  7 22:02:58 postfix postfix/smtp[27188]: D130997201:
to=<[hidden email]>, relay=a74.localpc2105.com[10.0.0.74]:25,
conn_use=2, delay=71536, delays=27004/44448/84/0.28, dsn=2.0.0,
status=sent (250 ok 1262894578 qp 7412)
Jan  7 22:02:58 postfix postfix/qmgr[26441]: D130997201: removed
Jan  7 22:02:59 postfix postfix/smtp[27033]: 6BD743906A:
to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25,
conn_use=62, delay=81473, delays=36941/44467/65/0.24, dsn=2.0.0,
status=sent (250 ok 1262894579 qp 12157)
Jan  7 22:02:59 postfix postfix/qmgr[26441]: 6BD743906A: removed
Jan  7 22:02:59 postfix postfix/smtp[26793]: 84947C14B2:
to=<[hidden email]>, relay=a70.localpc2105.com[10.0.0.70]:25,
conn_use=12, delay=69401, delays=24868/44469/63/0.2, dsn=2.0.0,
status=sent (250 ok 1262894578 qp 23084)
Jan  7 22:02:59 postfix postfix/qmgr[26441]: 84947C14B2: removed
Jan  7 22:02:59 postfix postfix/smtp[26737]: 6023552F52:
to=<[hidden email]>,
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=51, delay=96132,
delays=51599/44448/84/0.3, dsn=2.0.0, status=sent (250 ok 1262894579 qp
12158)
Jan  7 22:02:59 postfix postfix/qmgr[26441]: 6023552F52: removed
Jan  7 22:02:59 postfix postfix/smtp[27134]: connect to
a132.localpc2105.com[10.0.0.132]:25: Connection timed out
Jan  7 22:02:59 postfix postfix/smtp[26717]: 96A447C426:
to=<[hidden email]>,
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=63, delay=103800,
delays=59267/44433/99/0.27, dsn=2.0.0, status=sent (250 ok 1262894579 qp
12166)
Jan  7 22:02:59 postfix postfix/qmgr[26441]: 96A447C426: removed
Jan  7 22:02:59 postfix postfix/smtp[26727]: connect to
a133.localpc2105.com[10.0.0.133]:25: Connection timed out
Jan  7 22:02:59 postfix postfix/smtp[27204]: 485A39C9E8:
to=<[hidden email]>,
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=52, delay=69946,
delays=25414/44492/40/0.29, dsn=2.0.0, status=sent (250 ok 1262894579 qp
12172)
Jan  7 22:02:59 postfix postfix/qmgr[26441]: 485A39C9E8: removed
Jan  7 22:02:59 postfix postfix/smtp[26727]: E86E9D9F79:
to=<[hidden email]>, relay=a67.localpc2105.com[10.0.0.67]:25,
delay=59696, delays=15163/44433/100/0.17, dsn=2.0.0, status=sent (250 ok
1262894579 qp 3991)
Jan  7 22:02:59 postfix postfix/qmgr[26441]: E86E9D9F79: removed
Jan  7 22:02:59 postfix postfix/smtp[26924]: C6905ACBA3:
to=<[hidden email]>,
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=64, delay=96901,
delays=52368/44474/58/0.24, dsn=2.0.0, status=sent (250 ok 1262894579 qp
12179)
Jan  7 22:02:59 postfix postfix/qmgr[26441]: C6905ACBA3: removed

Everything seems to be OK, emails are relayed to my qmails servers.

But if you have a look on these logs, related to 2 messages, you will
see that messages where posted on the postix a 2:37AM and 9:09 AM and
delivered at 22:03 !!!!  20 hours in the queue, with no errors, no
warnings....


Jan  7 02:37:13 postfix postfix/smtpd[20640]: 485A39C9E8:
client=unknown[10.0.0.241]
Jan  7 02:37:13 postfix postfix/cleanup[20642]: 485A39C9E8:
message-id=<[hidden email]>
Jan  7 02:37:13 postfix postfix/qmgr[31260]: 485A39C9E8:
from=<[hidden email]>, size=6136, nrcpt=1 (queue active)
Jan  7 05:28:03 postfix postfix/smtpd[3663]: E86E9D9F79:
client=unknown[10.0.0.241]
Jan  7 05:28:03 postfix postfix/cleanup[3955]: E86E9D9F79:
message-id=<[hidden email]>
Jan  7 05:28:04 postfix postfix/qmgr[31260]: E86E9D9F79:
from=<[hidden email]>, size=11140, nrcpt=1 (queue active)
Jan  7 08:56:31 postfix postfix/qmgr[21787]: E86E9D9F79:
from=<[hidden email]>, size=11140, nrcpt=1 (queue active)
Jan  7 08:56:31 postfix postfix/qmgr[21787]: 485A39C9E8:
from=<[hidden email]>, size=6136, nrcpt=1 (queue active)
Jan  7 09:09:56 postfix postfix/qmgr[23046]: E86E9D9F79:
from=<[hidden email]>, size=11140, nrcpt=1 (queue active)
Jan  7 09:09:57 postfix postfix/qmgr[23046]: 485A39C9E8:
from=<[hidden email]>, size=6136, nrcpt=1 (queue active)
Jan  7 09:40:46 postfix poJan  7 22:02:57 postfix postfix/qmgr[26441]:
5B91F873F6: removed
Jan  7 22:02:57 postfix postfix/smtp[27180]: 375DDD5923:
to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25,
conn_use=59, delay=61550, delays=17019/44435/96/0.17, dsn=2.0.0,
status=sent (250 ok 1262894577 qp 12113)
Jan  7 22:02:57 postfix postfix/qmgr[26441]: 375DDD5923: removed
Jan  7 22:02:58 postfix postfix/smtp[27070]: 7F0F2943B3:
to=<[hidden email]>, relay=a70.localpc2105.com[10.0.0.70]:25,
conn_use=10, delay=73795, delays=29264/44481/50/0.21, dsn=2.0.0,
status=sent (250 ok 1262894577 qp 23067)
Jan  7 22:02:58 postfix postfix/qmgr[26441]: 7F0F2943B3: removed
Jan  7 22:02:58 postfix postfix/smtp[27050]: 32BB182182:
to=<[hidden email]>,
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=48, delay=73799,
delays=29268/44466/65/0.28, dsn=2.0.0, status=sent (250 ok 1262894578 qp
12121)
Jan  7 22:02:58 postfix postfix/qmgr[26441]: 32BB182182: removed
Jan  7 22:02:58 postfix postfix/smtp[26758]: 577D6C7F7D:
to=<[hidden email]>,
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=60, delay=68451,
delays=23920/44481/50/0.29, dsn=2.0.0, status=sent (250 ok 1262894578 qp
12122)
Jan  7 22:02:58 postfix postfix/qmgr[26441]: 577D6C7F7D: removed
Jan  7 22:02:58 postfix postfix/smtp[26935]: CDCE074F53:
to=<[hidden email]>,
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=49, delay=104597,
delays=60065/44421/110/0.3, dsn=2.0.0, status=sent (250 ok 1262894578 qp
12135)
Jan  7 22:02:58 postfix postfix/qmgr[26441]: CDCE074F53: removed
Jan  7 22:02:58 postfix postfix/smtp[26708]: 4B0B6E77FD:
to=<[hidden email]>,
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=61, delay=46137,
delays=1606/44461/70/0.31, dsn=2.0.0, status=sent (250 ok 1262894578 qp
12136)
Jan  7 22:02:58 postfix postfix/qmgr[26441]: 4B0B6E77FD: removed
Jan  7 22:02:58 postfix postfix/smtp[26794]: D2CB5DC84C:
to=<[hidden email]>,
relay=a70.localpc2105.com[10.0.0.70]:25, conn_use=11, delay=58160,
delays=13628/44481/50/0.23, dsn=2.0.0, status=sent (250 ok 1262894578 qp
23076)
Jan  7 22:02:58 postfix postfix/qmgr[26441]: D2CB5DC84C: removed
Jan  7 22:02:58 postfix postfix/smtp[26968]: 1A651E17E0:
to=<[hidden email]>, relay=a74.localpc2105.com[10.0.0.74]:25,
conn_use=2, delay=54426, delays=9894/44462/69/0.27, dsn=2.0.0,
status=sent (250 ok 1262894578 qp 7411)
Jan  7 22:02:58 postfix postfix/qmgr[26441]: 1A651E17E0: removed
Jan  7 22:02:58 postfix postfix/smtp[27037]: 4CCC486B55:
to=<[hidden email]>,
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=50, delay=45538,
delays=1005/44407/125/0.17, dsn=2.0.0, status=sent (250 ok 1262894578 qp
12150)
Jan  7 22:02:58 postfix postfix/qmgr[26441]: 4CCC486B55: removed
Jan  7 22:02:58 postfix postfix/smtp[27188]: D130997201:
to=<[hidden email]>, relay=a74.localpc2105.com[10.0.0.74]:25,
conn_use=2, delay=71536, delays=27004/44448/84/0.28, dsn=2.0.0,
status=sent (250 ok 1262894578 qp 7412)
Jan  7 22:02:58 postfix postfix/qmgr[26441]: D130997201: removed
Jan  7 22:02:59 postfix postfix/smtp[27033]: 6BD743906A:
to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25,
conn_use=62, delay=81473, delays=36941/44467/65/0.24, dsn=2.0.0,
status=sent (250 ok 1262894579 qp 12157)
Jan  7 22:02:59 postfix postfix/qmgr[26441]: 6BD743906A: removed
Jan  7 22:02:59 postfix postfix/smtp[26793]: 84947C14B2:
to=<[hidden email]>, relay=a70.localpc2105.com[10.0.0.70]:25,
conn_use=12, delay=69401, delays=24868/44469/63/0.2, dsn=2.0.0,
status=sent (250 ok 1262894578 qp 23084)
Jan  7 22:02:59 postfix postfix/qmgr[26441]: 84947C14B2: removed
Jan  7 22:02:59 postfix postfix/smtp[26737]: 6023552F52:
to=<[hidden email]>,
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=51, delay=96132,
delays=51599/44448/84/0.3, dsn=2.0.0, status=sent (250 ok 1262894579 qp
12158)
Jan  7 22:02:59 postfix postfix/qmgr[26441]: 6023552F52: removed
Jan  7 22:02:59 postfix postfix/smtp[27134]: connect to
a132.localpc2105.com[10.0.0.132]:25: Connection timed out
Jan  7 22:02:59 postfix postfix/smtp[26717]: 96A447C426:
to=<[hidden email]>,
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=63, delay=103800,
delays=59267/44433/99/0.27, dsn=2.0.0, status=sent (250 ok 1262894579 qp
12166)
Jan  7 22:02:59 postfix postfix/qmgr[26441]: 96A447C426: removed
Jan  7 22:02:59 postfix postfix/smtp[26727]: connect to
a133.localpc2105.com[10.0.0.133]:25: Connection timed out
Jan  7 22:02:59 postfix postfix/smtp[27204]: 485A39C9E8:
to=<[hidden email]>,
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=52, delay=69946,
delays=25414/44492/40/0.29, dsn=2.0.0, status=sent (250 ok 1262894579 qp
12172)
Jan  7 22:02:59 postfix postfix/qmgr[26441]: 485A39C9E8: removed
Jan  7 22:02:59 postfix postfix/smtp[26727]: E86E9D9F79:
to=<[hidden email]>, relay=a67.localpc2105.com[10.0.0.67]:25,
delay=59696, delays=15163/44433/100/0.17, dsn=2.0.0, status=sent (250 ok
1262894579 qp 3991)
Jan  7 22:02:59 postfix postfix/qmgr[26441]: E86E9D9F79: removed
Jan  7 22:02:59 postfix postfix/smtp[26924]: C6905ACBA3:
to=<[hidden email]>,
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=64, delay=96901,
delays=52368/44474/58/0.24, dsn=2.0.0, status=sent (250 ok 1262894579 qp
12179)
Jan  7 22:02:59 postfix postfix/qmgr[26441]: C6905ACBA3: removed
stfix/qmgr[26441]: E86E9D9F79: from=<[hidden email]>,
size=11140, nrcpt=1 (queue active)
Jan  7 09:40:47 postfix postfix/qmgr[26441]: 485A39C9E8:
from=<[hidden email]>, size=6136, nrcpt=1 (queue active)
Jan  7 22:02:59 postfix postfix/smtp[27204]: 485A39C9E8:
to=<[hidden email]>,
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=52, delay=69946,
delays=25414/44492/40/0.29, dsn=2.0.0, status=sent (250 ok 1262894579 qp
12172)
Jan  7 22:02:59 postfix postfix/qmgr[26441]: 485A39C9E8: removed
Jan  7 22:02:59 postfix postfix/smtp[26727]: E86E9D9F79:
to=<[hidden email]>, relay=a67.localpc2105.com[10.0.0.67]:25,
delay=59696, delays=15163/44433/100/0.17, dsn=2.0.0, status=sent (250 ok
1262894579 qp 3991)
Jan  7 22:02:59 postfix postfix/qmgr[26441]: E86E9D9F79: removed

It is very strange.

Patrick

Reply | Threaded
Open this post in threaded view
|

Re: Huge active queue and system idle, not delivering

Victor Duchovni
On Fri, Jan 08, 2010 at 12:30:34AM +0200, Patrick Chemla wrote:

> Jan  7 22:02:57 postfix postfix/qmgr[26441]: 5B91F873F6: removed
> Jan  7 22:02:57 postfix postfix/smtp[27180]: 375DDD5923:
> to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25,
> conn_use=59, delay=61550, delays=17019/44435/96/0.17, dsn=2.0.0,
> status=sent (250 ok 1262894577 qp 12113)

This recipient does not match the destination that is clogging the
queue. Is the queue clogged with postmaster notices. I never enable
any postmaster notices, they don't scale.

        notify_classes =

This said, the 96 seconds of connection setup latency is an obvious and
severe problem. Why on earth does it take 96 seconds to complete a HELO
handshake with "a139.localpcc2105.com"? You are not going to get much
mail out if each delivery takes 96 seconds...

Is your Postfix server's IP address resolvable on the qmail systems?
Are they doing some sort of pre-banner delay? ...


> Jan  7 22:02:58 postfix postfix/smtp[27070]: 7F0F2943B3:
> to=<[hidden email]>, relay=a70.localpc2105.com[10.0.0.70]:25,
> conn_use=10, delay=73795, delays=29264/44481/50/0.21, dsn=2.0.0,
> status=sent (250 ok 1262894577 qp 23067)

Once again, 50 seconds is severely crippled.

> Jan  7 22:02:58 postfix postfix/smtp[27050]: 32BB182182:
> to=<[hidden email]>,
> relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=48, delay=73799,
> delays=29268/44466/65/0.28, dsn=2.0.0, status=sent (250 ok 1262894578 qp
> 12121)

This is enough. Fix this.

Where are the deliveries to the clogged destination???

--
        Viktor.

Disclaimer: off-list followups get on-list replies or get ignored.
Please do not ignore the "Reply-To" header.

To unsubscribe from the postfix-users list, visit
http://www.postfix.org/lists.html or click the link below:
<mailto:[hidden email]?body=unsubscribe%20postfix-users>

If my response solves your problem, the best way to thank me is to not
send an "it worked, thanks" follow-up. If you must respond, please put
"It worked, thanks" in the "Subject" so I can delete these quickly.
Reply | Threaded
Open this post in threaded view
|

Re: Huge active queue and system idle, not delivering

Wietse Venema
In reply to this post by Patrick Chemla
Patrick Chemla:
> >> But the CPU of the box is idle more than 80%. It is clear that it is not a
> >> matter of CPU, nor memory, nor disk. Something in the number of
> >> processes/users/simultaneous tasks is blocking.

Indeed, the symptom of blocking is in the third field of
the Postfix "delays" logging.

       The format of the "delays=a/b/c/d" logging is as follows:

       o      a = time from message arrival to last active queue entry

       o      b = time from last active queue entry to connection setup

       o      c = time in connection setup, including DNS, EHLO and TLS

       o      d = time in message transmission

In your case, it takes a minute or more to set up the connection
including DNS lookup and EHLO handshake. That is holding up your mail.

- Check if the qmail servers are responsive (telnet hostname 25).

- Check if your Postfix needs a /var/spool/postfix/etc/resolv.conf
  file, and if that file is consistent with /etc/resolv.conf. If
  Postfix needs /var/spool/postfix/etc/resolv.conf and the file
  is missong or contains a bogus server that will add time to
  your deliveries.

- If they aren't, increase the concurrency on the qmail side.

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: Huge active queue and system idle, not delivering

Patrick Chemla
Le 08/01/2010 03:03, Wietse Venema a écrit :

> Patrick Chemla:
>    
>>>> But the CPU of the box is idle more than 80%. It is clear that it is not a
>>>> matter of CPU, nor memory, nor disk. Something in the number of
>>>> processes/users/simultaneous tasks is blocking.
>>>>          
> Indeed, the symptom of blocking is in the third field of
> the Postfix "delays" logging.
>
>         The format of the "delays=a/b/c/d" logging is as follows:
>
>         o      a = time from message arrival to last active queue entry
>
>         o      b = time from last active queue entry to connection setup
>
>         o      c = time in connection setup, including DNS, EHLO and TLS
>
>         o      d = time in message transmission
>
> In your case, it takes a minute or more to set up the connection
> including DNS lookup and EHLO handshake. That is holding up your mail.
>
> - Check if the qmail servers are responsive (telnet hostname 25).
>
>    
qmail are responsive. I made some arrangements to my DNS. DNS is better
now, but the connexion is still very slow. I saw this morning c=285.
> - Check if your Postfix needs a /var/spool/postfix/etc/resolv.conf
>    file, and if that file is consistent with /etc/resolv.conf. If
>    Postfix needs /var/spool/postfix/etc/resolv.conf and the file
>    is missong or contains a bogus server that will add time to
>    your deliveries.
>
>    
Hi Wietse,
How do I know if " Postfix needs a /var/spool/postfix/etc/resolv.conf"
directory /var/spool/postfix/etc doesn't exist.


> - If they aren't, increase the concurrency on the qmail side.
>
>    
conccurency =100. It's already a large number. I can increase it.
> Wietse
>    
Thanks
Patrick

Reply | Threaded
Open this post in threaded view
|

Re: Huge active queue and system idle, not delivering

Patrick Chemla
In reply to this post by Victor Duchovni
Le 08/01/2010 00:43, Victor Duchovni a écrit :

> On Fri, Jan 08, 2010 at 12:30:34AM +0200, Patrick Chemla wrote:
>
>    
>> Jan  7 22:02:57 postfix postfix/qmgr[26441]: 5B91F873F6: removed
>> Jan  7 22:02:57 postfix postfix/smtp[27180]: 375DDD5923:
>> to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25,
>> conn_use=59, delay=61550, delays=17019/44435/96/0.17, dsn=2.0.0,
>> status=sent (250 ok 1262894577 qp 12113)
>>      
> This recipient does not match the destination that is clogging the
> queue. Is the queue clogged with postmaster notices. I never enable
> any postmaster notices, they don't scale.
>
> notify_classes =
>
>    
done, no change.
> This said, the 96 seconds of connection setup latency is an obvious and
> severe problem. Why on earth does it take 96 seconds to complete a HELO
> handshake with "a139.localpcc2105.com"? You are not going to get much
> mail out if each delivery takes 96 seconds...
>
> Is your Postfix server's IP address resolvable on the qmail systems?
>    
Should it be? qmail accept all RELAY CLIENT from local network.
> Are they doing some sort of pre-banner delay? ...
>
>
>    
When I do  telnet a139.localpc2105.com 25, I get immediate response.
>> Jan  7 22:02:58 postfix postfix/smtp[27070]: 7F0F2943B3:
>> to=<[hidden email]>, relay=a70.localpc2105.com[10.0.0.70]:25,
>> conn_use=10, delay=73795, delays=29264/44481/50/0.21, dsn=2.0.0,
>> status=sent (250 ok 1262894577 qp 23067)
>>      
> Once again, 50 seconds is severely crippled.
>
>    
When I telnet a70.localpc2105.com 25 I get an immediate response.

I have checked my local DNS. There were some troubles, and I made some
improvements. I have now 2 local caching DNS respawning fast. All qmail
servers addresses are in the postfix /etc/hosts to avoid Ip lookup.
I have checked qmails servers, nothing has changed since they were able
to have a queue of 200,000 messages, but they have now a few hundreds only.
I have calculated average times to complete HELO. All qmails are in the
same kind of value around 2 minutes. Not any one is better than others.
Again, each was handling a queue of hundreds thousands before I set up
the postfix relay to load balance.
I really don't have a clue. I don't know where to look.
>> Jan  7 22:02:58 postfix postfix/smtp[27050]: 32BB182182:
>> to=<[hidden email]>,
>> relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=48, delay=73799,
>> delays=29268/44466/65/0.28, dsn=2.0.0, status=sent (250 ok 1262894578 qp
>> 12121)
>>      
> This is enough. Fix this.
>
>    
How I can fix it if it works fine through telnet?
> Where are the deliveries to the clogged destination???
>
>    
Sorry, I don't understand this question. Please be clear.

Patrick

Reply | Threaded
Open this post in threaded view
|

Re: Huge active queue and system idle, not delivering

Wietse Venema
Patrick Chemla:
[ Charset ISO-8859-1 unsupported, converting... ]
> Le 08/01/2010 00:43, Victor Duchovni a ?crit :
> > On Fri, Jan 08, 2010 at 12:30:34AM +0200, Patrick Chemla wrote:
> >
> >    
> >> Jan  7 22:02:57 postfix postfix/qmgr[26441]: 5B91F873F6: removed
> >> Jan  7 22:02:57 postfix postfix/smtp[27180]: 375DDD5923:
> >> to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25,
> >> conn_use=59, delay=61550, delays=17019/44435/96/0.17, dsn=2.0.0,
     ^^^^^^^^^^^

Note that this connection has been reused multiple times (see below
for what this means in Postfix).

Why does it take 69 seconds to initialize a reused SMTP connection?

What happens when you set

    smtp_connection_cache_on_demand=no

in main.cf (and do "postfix reload")?

If this makes a difference, then

a) you have a problem with smtp<->scache communication.

b) qmail does not like RSET commands

c) Your machine is running low on memory and swapping out the scache
process.

d) something else.

        Wietse

Under high load, smtp(8) processes give their open connections to
scache(8). Later, they ask scache(8) for an open connection to a
specific destination.  Once an smtp(8) client retrieves an open
connection, it sends RSET tothe remote server and waits for a 250
reply (i.e. the server is still happy). According to the logfile
record this lookup/rset/reply sequence is taking 96 seconds.
Reply | Threaded
Open this post in threaded view
|

Re: Huge active queue and system idle, not delivering

Stan Hoeppner
In reply to this post by Patrick Chemla

On Fri, 08 Jan 2010 15:24:25 +0200, Patrick Chemla

> When I telnet a70.localpc2105.com 25 I get an immediate response.

I assume you are telnet'ing from the Postfix server with the queue delay
problem.  At this point, after you receive the 220, type:

ehlo your.postfix-server.tld <enter>

and time the delay of the 250 responses.  Continue to do a complete manual
mail transaction through telnet, and time each smtp command completion
(wall clock is fine).  Post the results here please.

--
Stan
Reply | Threaded
Open this post in threaded view
|

Re: Huge active queue and system idle, not delivering

Wietse Venema
In reply to this post by Wietse Venema
Wietse Venema:

> Patrick Chemla:
> > Le 08/01/2010 00:43, Victor Duchovni a ?crit :
> > > On Fri, Jan 08, 2010 at 12:30:34AM +0200, Patrick Chemla wrote:
> > >
> > >    
> > >> Jan  7 22:02:57 postfix postfix/qmgr[26441]: 5B91F873F6: removed
> > >> Jan  7 22:02:57 postfix postfix/smtp[27180]: 375DDD5923:
> > >> to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25,
> > >> conn_use=59, delay=61550, delays=17019/44435/96/0.17, dsn=2.0.0,
>      ^^^^^^^^^^^
>
> Note that this connection has been reused multiple times (see below
> for what this means in Postfix).
>
> Why does it take 69 seconds to initialize a reused SMTP connection?
>
> What happens when you set
>
>     smtp_connection_cache_on_demand=no
>
> in main.cf (and do "postfix reload")?
>
> If this makes a difference, then

Check your qmail configuration for "tarpit" options. There exist
qmail patches that will slow down the qmail SMTP server when the
client sends lots of email, or lots of recipients.

        Wietse

> a) you have a problem with smtp<->scache communication.
>
> b) qmail does not like RSET commands
>
> c) Your machine is running low on memory and swapping out the scache
> process.
>
> d) something else.
>
> Wietse
>
> Under high load, smtp(8) processes give their open connections to
> scache(8). Later, they ask scache(8) for an open connection to a
> specific destination.  Once an smtp(8) client retrieves an open
> connection, it sends RSET tothe remote server and waits for a 250
> reply (i.e. the server is still happy). According to the logfile
> record this lookup/rset/reply sequence is taking 96 seconds.
>
>

12