Hi,
I am running Postfix 2.5.6 on a Fedora 11 Linux system on a hardware based Intel I5/750 Quad Core, 8 Gb memory, 160Gb SSD hard disk. Incoming messages are entering very fast (500 smtp processes declared) and the active queue is actually of 2 millions messages waiting for delivery. The delivery, for all messages should go through a farm of 30 MX servers from domain localpc2105.com, on load balancing through DNS resolution. DNS server is of course local. All 30 MX servers are running qmail. All of them are more than 90% idle. Before I set up my postfix server, email were sent directly to the qmail servers, and qmail was running at full CPU. So I am sure that qmail can handle much more faster. I have set up the postfix server to load balance the load between all the 30 qmail servers to avoid situation where some were running at full charge and others were not working. Postfix server, 30 Qmail servers, DNS are on the same 1Gb LAN. I don't need any messages or clients or recipients control at this stage. No anti spam, no anti virus. All this is already done by qmail servers, despite my plan is to replace all qmails by postfix. CPU is more than 85% idle on my postfix I5/750 box, but the outbound queue is very very slow. It seems that something refrain qmgr to work at full range, despite the parameters here is my main.cf file: queue_directory = /var/spool/postfix command_directory = /usr/sbin daemon_directory = /usr/libexec/postfix data_directory = /var/lib/postfix mail_owner = postfix myhostname = postfix.proacti5.net mydomain = localpc2105.com inet_interfaces = all mydestination = $myhostname, localhost.$mydomain, localhost unknown_local_recipient_reject_code = 550 mynetworks = 172.27.27.0/24, 10.0.0.0/24, 127.0.0.0/24 relayhost = $mydomain alias_maps = hash:/etc/aliases alias_database = hash:/etc/aliases local_destination_recipient_limit = 500 local_destination_concurrency_limit = 50 debug_peer_level = 8 debug_peer_list = orange.fr debugger_command = PATH=/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin ddd $daemon_directory/$process_name $process_id & sleep 5 sendmail_path = /usr/sbin/sendmail.postfix newaliases_path = /usr/bin/newaliases.postfix mailq_path = /usr/bin/mailq.postfix setgid_group = postdrop html_directory = no manpage_directory = /usr/share/man sample_directory = /usr/share/doc/postfix-2.5.6/samples readme_directory = /usr/share/doc/postfix-2.5.6/README_FILES inet_protocols = all default_process_limit = 1000 initial_destination_concurrency = 100 transport_initial_destination_concurrency = 100 default_destination_concurrency_failed_cohort_limit = 10 default_destination_recipient_limit = 200 transport_destination_recipient_limit = 100 default_delivery_slot_cost = 30 default_minimum_delivery_slots = 30 default_delivery_slot_discount = 100 smtpd_peername_lookup = no default_recipient_limit = 2000000 mailbox_size_limit = 5120000000 qmgr_message_active_limit = 2000000 qmgr_message_recipient_limit = 2000000 default_destination_concurrency_limit = 500 lmtp_destination_concurrency_limit = $default_destination_concurrency_limit relay_destination_concurrency_limit = $default_destination_concurrency_limit smtp_destination_concurrency_limit = $default_destination_concurrency_limit max_use = 1000 mime_nesting_limit = 100 qmgr_fudge_factor = 200 queue_file_attribute_count_limit = 250 smtpd_history_flush_threshold = 100 smtpd_junk_command_limit = 100 smtp_connect_timeout = 10s smtp_data_done_timeout = 10s smtp_mail_timeout = 5s Here is my master.cf file:
I tried many combinations to speed up the delivery. Nothing help up to now. I have look up in tons of threads in many places but haven't found a real solution. I just found that Postfix could send 1 million emails per hour when I send less than a half million in 24 hours. Please help Patrick |
Patrick Chemla:
> Hi, > > I am running Postfix 2.5.6 on a Fedora 11 Linux system on a hardware > based Intel I5/750 Quad Core, 8 Gb memory, 160Gb SSD hard disk. > > Incoming messages are entering very fast (500 smtp processes declared) > and the active queue is actually of 2 millions messages waiting for > delivery. > > The delivery, for all messages should go through a farm of 30 MX servers > from domain localpc2105.com, on load balancing through DNS resolution. > DNS server is of course local. All 30 MX servers are running qmail. All > of them are more than 90% idle. Before I set up my postfix server, email > were sent directly to the qmail servers, and qmail was running at full > CPU. So I am sure that qmail can handle much more faster. I have set up > the postfix server to load balance the load between all the 30 qmail > servers to avoid situation where some were running at full charge and > others were not working. http://www.postfix.org/DEBUG_README.html#logging Wietse |
In reply to this post by Patrick Chemla
2010/1/8 Patrick Chemla <[hidden email]>
> Incoming messages are entering very fast (500 smtp processes declared) and > the active queue is actually of 2 millions messages waiting for delivery. > <snip> > here is my main.cf file: That's some very thorough information, you've provided plenty of context and clear description, which is great. While I lack sufficient knowledge to provide thoughts on the bottlenecking, I *do* expect that people will want to see the output of `postconf -n`, instead of your main.cf (to ensure we see what postfix actually sees and uses). Can you clarify what you mean by "500 smtp processes declared"? A sample output from qshape also wouldn't go astray either (http://www.postfix.org/qshape.1.html). You're provided some proportional figures (percentages), but some solid throughput numbers would be good too. Eg. "We're injecting 2 million messages to the postfix box, we expect to enqueue them in X hrs, but it takes Y hrs, and they're only leaving the postfix box at Z messages/sec". I see you said "I just found that Postfix could send 1 million emails per hour when I send less than a half million in 24 hours", but I can't make sense of that, sorry. |
Le 07/01/2010 20:03, Barney Desmond a écrit :
> 2010/1/8 Patrick Chemla<[hidden email]> > >> Incoming messages are entering very fast (500 smtp processes declared) and >> the active queue is actually of 2 millions messages waiting for delivery. >> <snip> >> here is my main.cf file: >> > That's some very thorough information, you've provided plenty of > context and clear description, which is great. While I lack sufficient > knowledge to provide thoughts on the bottlenecking, I *do* expect that > people will want to see the output of `postconf -n`, instead of your > main.cf (to ensure we see what postfix actually sees and uses). > > alias_database = hash:/etc/aliases alias_maps = hash:/etc/aliases command_directory = /usr/sbin config_directory = /etc/postfix daemon_directory = /usr/libexec/postfix data_directory = /var/lib/postfix debug_peer_level = 8 debug_peer_list = orange.fr default_delivery_slot_cost = 30 default_delivery_slot_discount = 100 default_destination_concurrency_failed_cohort_limit = 10 default_destination_concurrency_limit = 500 default_destination_recipient_limit = 200 default_minimum_delivery_slots = 30 default_process_limit = 1000 default_recipient_limit = 2000000 html_directory = no inet_interfaces = all inet_protocols = all initial_destination_concurrency = 100 lmtp_destination_concurrency_limit = $default_destination_concurrency_limit local_destination_concurrency_limit = 50 local_destination_recipient_limit = 500 mail_owner = postfix mailbox_size_limit = 5120000000 mailq_path = /usr/bin/mailq.postfix manpage_directory = /usr/share/man max_use = 1000 mime_nesting_limit = 100 mydestination = $myhostname, localhost.$mydomain, localhost mydomain = localpc2105.com myhostname = postfix.proacti5.net mynetworks = 172.27.27.0/24, 10.0.0.0/24, 127.0.0.0/24 newaliases_path = /usr/bin/newaliases.postfix qmgr_fudge_factor = 200 qmgr_message_active_limit = 2000000 qmgr_message_recipient_limit = 2000000 queue_directory = /var/spool/postfix queue_file_attribute_count_limit = 250 readme_directory = /usr/share/doc/postfix-2.5.6/README_FILES relay_destination_concurrency_limit = $default_destination_concurrency_limit relayhost = $mydomain sample_directory = /usr/share/doc/postfix-2.5.6/samples sendmail_path = /usr/sbin/sendmail.postfix setgid_group = postdrop smtp_connect_timeout = 10s smtp_data_done_timeout = 10s smtp_destination_concurrency_limit = $default_destination_concurrency_limit smtp_mail_timeout = 5s smtpd_history_flush_threshold = 100 smtpd_junk_command_limit = 100 smtpd_peername_lookup = no unknown_local_recipient_reject_code = 550 > Can you clarify what you mean by "500 smtp processes declared"? A > sample output from qshape also wouldn't go astray either > (http://www.postfix.org/qshape.1.html). Here is qshape: T 5 10 20 40 80 160 320 640 1280 1280+ TOTAL 133000 0 0 0 0 0 2470 40538 80844 7167 1981 wanadoo.fr 61955 0 0 0 0 0 2469 26830 31340 1260 56 orange.fr 4171 0 0 0 0 0 0 1176 2144 540 311 skynet.be 3286 0 0 0 0 0 0 1 3284 0 1 aliceadsl.fr 3259 0 0 0 0 0 0 54 3169 25 11 aol.com 3150 0 0 0 0 0 0 1545 1524 40 41 free.fr 2138 0 0 0 0 0 0 453 1561 89 35 sfr.fr 840 0 0 0 0 0 0 23 816 1 0 hotmail.fr 679 0 0 0 0 0 0 150 420 12 97 telenet.be 658 0 0 0 0 0 0 0 658 0 0 gmail.com 358 0 0 0 0 0 0 157 145 11 45 hotmail.com 325 0 0 0 0 0 0 44 220 20 41 neuf.fr 252 0 0 0 0 0 0 62 176 14 0 9online.fr 250 0 0 0 0 0 0 6 244 0 0 cegetel.net 195 0 0 0 0 0 0 26 155 4 10 laposte.net 183 0 0 0 0 0 0 51 93 15 24 swing.be 141 0 0 0 0 0 0 2 139 0 0 9business.fr 111 0 0 0 0 0 0 23 85 3 0 sonepar.fr 107 0 0 0 0 0 0 33 72 2 0 axa.fr 103 0 0 0 0 0 0 30 67 1 5 most of the messages stay in the queue for hours. > You're provided some > proportional figures (percentages), but some solid throughput numbers > would be good too. Eg. "We're injecting 2 million messages to the > postfix box, we expect to enqueue them in X hrs, but it takes Y hrs, > and they're only leaving the postfix box at Z messages/sec". I see you > said "I just found that Postfix could send 1 million emails per hour > when I send less than a half million in 24 hours", but I can't make > sense of that, sorry. > I have to inject 2 to 4 millions emails to the postfix box in 24 hours, and I expect to deliver within the same delay. Actually, I can't deliver more than 500,000 per 24h hours. But the CPU of the box is idle more than 80%. It is clear that it is not a matter of CPU, nor memory, nor disk. Something in the number of processes/users/simultaneous tasks is blocking. |
In reply to this post by Wietse Venema
Le 07/01/2010 20:00, Wietse Venema a écrit :
> Patrick Chemla: > >> Hi, >> >> I am running Postfix 2.5.6 on a Fedora 11 Linux system on a hardware >> based Intel I5/750 Quad Core, 8 Gb memory, 160Gb SSD hard disk. >> >> Incoming messages are entering very fast (500 smtp processes declared) >> and the active queue is actually of 2 millions messages waiting for >> delivery. >> >> The delivery, for all messages should go through a farm of 30 MX servers >> from domain localpc2105.com, on load balancing through DNS resolution. >> DNS server is of course local. All 30 MX servers are running qmail. All >> of them are more than 90% idle. Before I set up my postfix server, email >> were sent directly to the qmail servers, and qmail was running at full >> CPU. So I am sure that qmail can handle much more faster. I have set up >> the postfix server to load balance the load between all the 30 qmail >> servers to avoid situation where some were running at full charge and >> others were not working. >> > http://www.postfix.org/DEBUG_README.html#logging > > Wietse > Here the logs: Jan 6 23:12:48 postfix postfix/qmgr[31260]: warning: to turn off these warnings specify: qmgr_clog_warn_time = 0 Jan 6 23:19:39 postfix postfix/qmgr[31260]: warning: mail for localpc2105.com is using up 461335 of 461335 active queue entries Jan 6 23:19:39 postfix postfix/qmgr[31260]: warning: you may need to increase the main.cf smtp_destination_concurrency_limit from 100 Jan 6 23:19:39 postfix postfix/qmgr[31260]: warning: please avoid flushing the whole queue when you have Jan 6 23:19:39 postfix postfix/qmgr[31260]: warning: lots of deferred mail, that is bad for performance Jan 6 23:19:39 postfix postfix/qmgr[31260]: warning: to turn off these warnings specify: qmgr_clog_warn_time = 0 Jan 6 23:24:51 postfix postfix/qmgr[31260]: warning: mail for localpc2105.com is using up 461086 of 461086 active queue entries Jan 6 23:24:51 postfix postfix/qmgr[31260]: warning: you may need to increase the main.cf smtp_destination_concurrency_limit from 100 Jan 6 23:24:51 postfix postfix/qmgr[31260]: warning: please avoid flushing the whole queue when you have Jan 6 23:24:51 postfix postfix/qmgr[31260]: warning: lots of deferred mail, that is bad for performance Jan 6 23:24:51 postfix postfix/qmgr[31260]: warning: to turn off these warnings specify: qmgr_clog_warn_time = 0 Jan 6 23:29:51 postfix postfix/qmgr[31260]: warning: mail for localpc2105.com is using up 460872 of 460872 active queue entries Jan 6 23:29:51 postfix postfix/qmgr[31260]: warning: you may need to increase the main.cf smtp_destination_concurrency_limit from 100 Jan 6 23:29:51 postfix postfix/qmgr[31260]: warning: please avoid flushing the whole queue when you have Jan 6 23:29:51 postfix postfix/qmgr[31260]: warning: lots of deferred mail, that is bad for performance Jan 6 23:29:51 postfix postfix/qmgr[31260]: warning: to turn off these warnings specify: qmgr_clog_warn_time = 0 Jan 6 23:35:51 postfix postfix/qmgr[31260]: warning: mail for localpc2105.com is using up 460025 of 460025 active queue entries Jan 6 23:35:51 postfix postfix/qmgr[31260]: warning: you may need to increase the main.cf smtp_destination_concurrency_limit from 100 Jan 6 23:35:51 postfix postfix/qmgr[31260]: warning: please avoid flushing the whole queue when you have Jan 6 23:35:51 postfix postfix/qmgr[31260]: warning: lots of deferred mail, that is bad for performance Jan 6 23:35:51 postfix postfix/qmgr[31260]: warning: to turn off these warnings specify: qmgr_clog_warn_time = 0 Jan 6 23:40:51 postfix postfix/qmgr[31260]: warning: mail for localpc2105.com is using up 460283 of 460283 active queue entries Jan 6 23:40:51 postfix postfix/qmgr[31260]: warning: you may need to increase the main.cf smtp_destination_concurrency_limit from 100 Jan 6 23:40:51 postfix postfix/qmgr[31260]: warning: please avoid flushing the whole queue when you have Jan 6 23:40:51 postfix postfix/qmgr[31260]: warning: lots of deferred mail, that is bad for performance Jan 6 23:40:51 postfix postfix/qmgr[31260]: warning: to turn off these warnings specify: qmgr_clog_warn_time = 0 Jan 6 23:47:21 postfix postfix/qmgr[31260]: warning: mail for localpc2105.com is using up 459714 of 459714 active queue entries Jan 6 23:47:21 postfix postfix/qmgr[31260]: warning: you may need to increase the main.cf smtp_destination_concurrency_limit from 100 Jan 6 23:47:21 postfix postfix/qmgr[31260]: warning: please avoid flushing the whole queue when you have Jan 6 23:47:21 postfix postfix/qmgr[31260]: warning: lots of deferred mail, that is bad for performance Jan 6 23:47:21 postfix postfix/qmgr[31260]: warning: to turn off these warnings specify: qmgr_clog_warn_time = 0 Jan 6 23:52:21 postfix postfix/qmgr[31260]: warning: mail for localpc2105.com is using up 459491 of 459491 active queue entries Jan 6 23:52:21 postfix postfix/qmgr[31260]: warning: you may need to increase the main.cf smtp_destination_concurrency_limit from 100 Jan 6 23:52:21 postfix postfix/qmgr[31260]: warning: please avoid flushing the whole queue when you have Jan 6 23:52:21 postfix postfix/qmgr[31260]: warning: lots of deferred mail, that is bad for performance Jan 6 23:52:21 postfix postfix/qmgr[31260]: warning: to turn off these warnings specify: qmgr_clog_warn_time = 0 Patrick |
In reply to this post by Patrick Chemla
On Thu, Jan 07, 2010 at 07:43:55PM +0200, Patrick Chemla wrote:
> CPU is more than 85% idle on my postfix I5/750 box, but the outbound queue > is very very slow. Throughput == Concurrency / Latency What destination are most of the messages in the queue going to? What is the associated transport? Are you using any content filters? What is the destination concurrency limit for that transport? What is the delivery latency to that transport? Show a/b/c/d data averaged (mean, median, stddev) over a bunch of log entries. > It seems that something refrain qmgr to work at full range, despite the > parameters Most of your parameter tweaks are counter-productive. Do not tweak anything other than the destination concurrency limit for a transport that delivers to a high capacity destination you control, say: # The 200 is not a golden value, start at 50 and raise only # if throughput improves as a result... # relay_destination_concurrency_limit = 200 > myhostname = postfix.proacti5.net > mydomain = localpc2105.com > inet_interfaces = all > mydestination = $myhostname, localhost.$mydomain, localhost > unknown_local_recipient_reject_code = 550 > mynetworks = 172.27.27.0/24, 10.0.0.0/24, 127.0.0.0/24 > relayhost = $mydomain With relayhost set, all remote mail goes to the MX hosts for $mydomain, so in this case, you can also raise: # The 200 is not a golden value, start at 50 and raise only # if throughput improves as a result... # smtp_destination_concurrency_limit = 200 if necessary. > local_destination_recipient_limit = 500 Terrible idea. > local_destination_concurrency_limit = 50 Terrible idea. > debug_peer_level = 8 Absurd. > debug_peer_list = orange.fr I hope very little mail goes there... > default_process_limit = 1000 Raise just the master.cf limits for the smtpd(8) and smtp(8) services. You don't need 1000 of everything. > initial_destination_concurrency = 100 Too high. > transport_initial_destination_concurrency = 100 You misunderstood the docs, this is useless. > default_destination_concurrency_failed_cohort_limit = 10 Should not be necessary. > default_destination_recipient_limit = 200 OK. > transport_destination_recipient_limit = 100 You misunderstood the docs, this is useless. > default_delivery_slot_cost = 30 > default_minimum_delivery_slots = 30 > default_delivery_slot_discount = 100 > qmgr_fudge_factor = 200 Don't mess with the nqmgr tunables, they are too subtle for mortals. > smtpd_peername_lookup = no When output is starved, why make the input even faster? > default_recipient_limit = 2000000 > qmgr_message_active_limit = 2000000 > qmgr_message_recipient_limit = 2000000 The Postfix queue does not scale to arbitrarily large sizes, at some point, there is more to do than available capacity to process the backlog. 2 million active messages may be OK for a mass-mail engine that fires up periodically, and works as fast as it can, but it is terrible for a mail forwarding relay. Which use-case are you in? > mailbox_size_limit = 5120000000 Why does this machine have any mailboxes at all? Isn't it a relay? What software performs well with 5GB mailboxes? > default_destination_concurrency_limit = 500 Better to specify "smtp", "relay" or both, but not "default". > lmtp_destination_concurrency_limit = $default_destination_concurrency_limit > smtp_destination_concurrency_limit = $default_destination_concurrency_limit > relay_destination_concurrency_limit = $default_destination_concurrency_limit > mime_nesting_limit = 100 These are default settings, don't add them to main.cf > max_use = 1000 Fine. > queue_file_attribute_count_limit = 250 > smtpd_history_flush_threshold = 100 Why??? > smtpd_junk_command_limit = 100 Why so generous to the input side? > smtp_connect_timeout = 10s Reasonable for a large nearby MX pool, you can even use 1s if you want. > smtp_data_done_timeout = 10s Really not a good idea. > smtp_mail_timeout = 5s A bit aggressive... > smtp inet n - n - - smtpd Tune the process limit here > qmgr fifo n - n 30 1 qmgr Why re-scan the incoming queue every 30 seconds? The default is fine. > smtp unix - - n - - smtp Adjust the process limit here to the right number of smtp(8) delivery agents. > relay unix > -o smtp_fallback_relay= - - n - - smtp Adjust this process limit if you service any relay domains. > I tried many combinations to speed up the delivery. Nothing help up to now. LOGS!!! > I just found that Postfix could send 1 million emails per hour when I send > less than a half million in 24 hours. LOGS!!! -- Viktor. Disclaimer: off-list followups get on-list replies or get ignored. Please do not ignore the "Reply-To" header. To unsubscribe from the postfix-users list, visit http://www.postfix.org/lists.html or click the link below: <mailto:[hidden email]?body=unsubscribe%20postfix-users> If my response solves your problem, the best way to thank me is to not send an "it worked, thanks" follow-up. If you must respond, please put "It worked, thanks" in the "Subject" so I can delete these quickly. |
In reply to this post by Patrick Chemla
On Thu, Jan 07, 2010 at 08:29:44PM +0200, Patrick Chemla wrote:
> Here the logs: This is just the qmgr(8) warnings about a clogged queue. Other than telling us that all the mail is going to "localpc2105.com", this is not very useful. Where are the logs from smtp(8)? What transport is "localpc2105.com" destined for? Any earlier logging about actual delivery attempts for this destination? -- Viktor. Disclaimer: off-list followups get on-list replies or get ignored. Please do not ignore the "Reply-To" header. To unsubscribe from the postfix-users list, visit http://www.postfix.org/lists.html or click the link below: <mailto:[hidden email]?body=unsubscribe%20postfix-users> If my response solves your problem, the best way to thank me is to not send an "it worked, thanks" follow-up. If you must respond, please put "It worked, thanks" in the "Subject" so I can delete these quickly. |
In reply to this post by Patrick Chemla
On Thu, Jan 7, 2010 at 1:25 PM, Patrick Chemla
<[hidden email]> wrote: >> said "I just found that Postfix could send 1 million emails per hour >> when I send less than a half million in 24 hours", but I can't make >> sense of that, sorry. >> > > I have to inject 2 to 4 millions emails to the postfix box in 24 hours, and > I expect to deliver within the same delay. > Actually, I can't deliver more than 500,000 per 24h hours. It could be viewed that half a million delivered in 24 hours is fine. Are you signing the mail? This can help with delivery rates to the large webmailer mx destinations. Stef > But the CPU of the box is idle more than 80%. It is clear that it is not a > matter of CPU, nor memory, nor disk. Something in the number of > processes/users/simultaneous tasks is blocking. > > > > |
On Thu, Jan 07, 2010 at 04:47:14PM -0500, Stefan Caunter wrote:
> > > > I have to inject 2 to 4 millions emails to the postfix box in 24 hours, and > > I expect to deliver within the same delay. > > Actually, I can't deliver more than 500,000 per 24h hours. > > It could be viewed that half a million delivered in 24 hours is fine. No, it is too slow, when there is no content inspection involved, especially with a nearby farm of relayhosts. > Are you signing the mail? This can help with delivery rates to the > large webmailer mx destinations. This is unrelated to the OP's problem. -- Viktor. Disclaimer: off-list followups get on-list replies or get ignored. Please do not ignore the "Reply-To" header. To unsubscribe from the postfix-users list, visit http://www.postfix.org/lists.html or click the link below: <mailto:[hidden email]?body=unsubscribe%20postfix-users> If my response solves your problem, the best way to thank me is to not send an "it worked, thanks" follow-up. If you must respond, please put "It worked, thanks" in the "Subject" so I can delete these quickly. |
In reply to this post by Stef Caunter
* Stefan Caunter <[hidden email]>:
> It could be viewed that half a million delivered in 24 hours is fine. > Are you signing the mail? This can help with delivery rates to the > large webmailer mx destinations. There are many things to consider: * DKIM signing - which is the prerequisite for getting into feedback loops at major email providers * get into the feedback loops at major email providers * SPF * good reputation (e.g. SenderBase, senderscore) -- Ralf Hildebrandt Geschäftsbereich IT | Abteilung Netzwerk Charité - Universitätsmedizin Berlin Campus Benjamin Franklin Hindenburgdamm 30 | D-12203 Berlin Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962 [hidden email] | http://www.charite.de |
On Thu, Jan 07, 2010 at 10:54:15PM +0100, Ralf Hildebrandt wrote:
> > It could be viewed that half a million delivered in 24 hours is fine. > > Are you signing the mail? This can help with delivery rates to the > > large webmailer mx destinations. > > There are many things to consider: > > * DKIM signing - which is the prerequisite for getting into feedback > loops at major email providers > * get into the feedback loops at major email providers > * SPF > * good reputation (e.g. SenderBase, senderscore) None of these apply to the OP's problem. He is sending mail to a pool of 30 qmail hosts. -- Viktor. Disclaimer: off-list followups get on-list replies or get ignored. Please do not ignore the "Reply-To" header. To unsubscribe from the postfix-users list, visit http://www.postfix.org/lists.html or click the link below: <mailto:[hidden email]?body=unsubscribe%20postfix-users> If my response solves your problem, the best way to thank me is to not send an "it worked, thanks" follow-up. If you must respond, please put "It worked, thanks" in the "Subject" so I can delete these quickly. |
In reply to this post by Stef Caunter
Le 07/01/2010 23:47, Stefan Caunter a écrit :
> On Thu, Jan 7, 2010 at 1:25 PM, Patrick Chemla > <[hidden email]> wrote: > > >>> said "I just found that Postfix could send 1 million emails per hour >>> when I send less than a half million in 24 hours", but I can't make >>> sense of that, sorry. >>> >>> >> I have to inject 2 to 4 millions emails to the postfix box in 24 hours, and >> I expect to deliver within the same delay. >> Actually, I can't deliver more than 500,000 per 24h hours. >> > It could be viewed that half a million delivered in 24 hours is fine. > Are you signing the mail? This can help with delivery rates to the > large webmailer mx destinations. > > Stef > > Half a million is 4 times lower than what we have done with qmail servers. Email are signed, but not from Postfix. Postfix must only relay mails from clients to local MXs. These local MXs will assume deliveries to the outside. Mail queue should be on these MXs, because they are dependant on final destinations. >> But the CPU of the box is idle more than 80%. It is clear that it is not a >> matter of CPU, nor memory, nor disk. Something in the number of >> processes/users/simultaneous tasks is blocking. >> >> >> >> >> |
In reply to this post by Victor Duchovni
Le 07/01/2010 20:37, Victor Duchovni a écrit :
> On Thu, Jan 07, 2010 at 08:29:44PM +0200, Patrick Chemla wrote: > > >> Here the logs: >> > This is just the qmgr(8) warnings about a clogged queue. Other than > telling us that all the mail is going to "localpc2105.com", this > is not very useful. Where are the logs from smtp(8)? > > What transport is "localpc2105.com" destined for? Any earlier > logging about actual delivery attempts for this destination? > > Victor, thank you for your interest. Daily logs are huge. Here is a sample of deliveries: Jan 7 22:02:57 postfix postfix/qmgr[26441]: 5B91F873F6: removed Jan 7 22:02:57 postfix postfix/smtp[27180]: 375DDD5923: to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=59, delay=61550, delays=17019/44435/96/0.17, dsn=2.0.0, status=sent (250 ok 1262894577 qp 12113) Jan 7 22:02:57 postfix postfix/qmgr[26441]: 375DDD5923: removed Jan 7 22:02:58 postfix postfix/smtp[27070]: 7F0F2943B3: to=<[hidden email]>, relay=a70.localpc2105.com[10.0.0.70]:25, conn_use=10, delay=73795, delays=29264/44481/50/0.21, dsn=2.0.0, status=sent (250 ok 1262894577 qp 23067) Jan 7 22:02:58 postfix postfix/qmgr[26441]: 7F0F2943B3: removed Jan 7 22:02:58 postfix postfix/smtp[27050]: 32BB182182: to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=48, delay=73799, delays=29268/44466/65/0.28, dsn=2.0.0, status=sent (250 ok 1262894578 qp 12121) Jan 7 22:02:58 postfix postfix/qmgr[26441]: 32BB182182: removed Jan 7 22:02:58 postfix postfix/smtp[26758]: 577D6C7F7D: to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=60, delay=68451, delays=23920/44481/50/0.29, dsn=2.0.0, status=sent (250 ok 1262894578 qp 12122) Jan 7 22:02:58 postfix postfix/qmgr[26441]: 577D6C7F7D: removed Jan 7 22:02:58 postfix postfix/smtp[26935]: CDCE074F53: to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=49, delay=104597, delays=60065/44421/110/0.3, dsn=2.0.0, status=sent (250 ok 1262894578 qp 12135) Jan 7 22:02:58 postfix postfix/qmgr[26441]: CDCE074F53: removed Jan 7 22:02:58 postfix postfix/smtp[26708]: 4B0B6E77FD: to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=61, delay=46137, delays=1606/44461/70/0.31, dsn=2.0.0, status=sent (250 ok 1262894578 qp 12136) Jan 7 22:02:58 postfix postfix/qmgr[26441]: 4B0B6E77FD: removed Jan 7 22:02:58 postfix postfix/smtp[26794]: D2CB5DC84C: to=<[hidden email]>, relay=a70.localpc2105.com[10.0.0.70]:25, conn_use=11, delay=58160, delays=13628/44481/50/0.23, dsn=2.0.0, status=sent (250 ok 1262894578 qp 23076) Jan 7 22:02:58 postfix postfix/qmgr[26441]: D2CB5DC84C: removed Jan 7 22:02:58 postfix postfix/smtp[26968]: 1A651E17E0: to=<[hidden email]>, relay=a74.localpc2105.com[10.0.0.74]:25, conn_use=2, delay=54426, delays=9894/44462/69/0.27, dsn=2.0.0, status=sent (250 ok 1262894578 qp 7411) Jan 7 22:02:58 postfix postfix/qmgr[26441]: 1A651E17E0: removed Jan 7 22:02:58 postfix postfix/smtp[27037]: 4CCC486B55: to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=50, delay=45538, delays=1005/44407/125/0.17, dsn=2.0.0, status=sent (250 ok 1262894578 qp 12150) Jan 7 22:02:58 postfix postfix/qmgr[26441]: 4CCC486B55: removed Jan 7 22:02:58 postfix postfix/smtp[27188]: D130997201: to=<[hidden email]>, relay=a74.localpc2105.com[10.0.0.74]:25, conn_use=2, delay=71536, delays=27004/44448/84/0.28, dsn=2.0.0, status=sent (250 ok 1262894578 qp 7412) Jan 7 22:02:58 postfix postfix/qmgr[26441]: D130997201: removed Jan 7 22:02:59 postfix postfix/smtp[27033]: 6BD743906A: to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=62, delay=81473, delays=36941/44467/65/0.24, dsn=2.0.0, status=sent (250 ok 1262894579 qp 12157) Jan 7 22:02:59 postfix postfix/qmgr[26441]: 6BD743906A: removed Jan 7 22:02:59 postfix postfix/smtp[26793]: 84947C14B2: to=<[hidden email]>, relay=a70.localpc2105.com[10.0.0.70]:25, conn_use=12, delay=69401, delays=24868/44469/63/0.2, dsn=2.0.0, status=sent (250 ok 1262894578 qp 23084) Jan 7 22:02:59 postfix postfix/qmgr[26441]: 84947C14B2: removed Jan 7 22:02:59 postfix postfix/smtp[26737]: 6023552F52: to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=51, delay=96132, delays=51599/44448/84/0.3, dsn=2.0.0, status=sent (250 ok 1262894579 qp 12158) Jan 7 22:02:59 postfix postfix/qmgr[26441]: 6023552F52: removed Jan 7 22:02:59 postfix postfix/smtp[27134]: connect to a132.localpc2105.com[10.0.0.132]:25: Connection timed out Jan 7 22:02:59 postfix postfix/smtp[26717]: 96A447C426: to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=63, delay=103800, delays=59267/44433/99/0.27, dsn=2.0.0, status=sent (250 ok 1262894579 qp 12166) Jan 7 22:02:59 postfix postfix/qmgr[26441]: 96A447C426: removed Jan 7 22:02:59 postfix postfix/smtp[26727]: connect to a133.localpc2105.com[10.0.0.133]:25: Connection timed out Jan 7 22:02:59 postfix postfix/smtp[27204]: 485A39C9E8: to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=52, delay=69946, delays=25414/44492/40/0.29, dsn=2.0.0, status=sent (250 ok 1262894579 qp 12172) Jan 7 22:02:59 postfix postfix/qmgr[26441]: 485A39C9E8: removed Jan 7 22:02:59 postfix postfix/smtp[26727]: E86E9D9F79: to=<[hidden email]>, relay=a67.localpc2105.com[10.0.0.67]:25, delay=59696, delays=15163/44433/100/0.17, dsn=2.0.0, status=sent (250 ok 1262894579 qp 3991) Jan 7 22:02:59 postfix postfix/qmgr[26441]: E86E9D9F79: removed Jan 7 22:02:59 postfix postfix/smtp[26924]: C6905ACBA3: to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=64, delay=96901, delays=52368/44474/58/0.24, dsn=2.0.0, status=sent (250 ok 1262894579 qp 12179) Jan 7 22:02:59 postfix postfix/qmgr[26441]: C6905ACBA3: removed Everything seems to be OK, emails are relayed to my qmails servers. But if you have a look on these logs, related to 2 messages, you will see that messages where posted on the postix a 2:37AM and 9:09 AM and delivered at 22:03 !!!! 20 hours in the queue, with no errors, no warnings.... Jan 7 02:37:13 postfix postfix/smtpd[20640]: 485A39C9E8: client=unknown[10.0.0.241] Jan 7 02:37:13 postfix postfix/cleanup[20642]: 485A39C9E8: message-id=<[hidden email]> Jan 7 02:37:13 postfix postfix/qmgr[31260]: 485A39C9E8: from=<[hidden email]>, size=6136, nrcpt=1 (queue active) Jan 7 05:28:03 postfix postfix/smtpd[3663]: E86E9D9F79: client=unknown[10.0.0.241] Jan 7 05:28:03 postfix postfix/cleanup[3955]: E86E9D9F79: message-id=<[hidden email]> Jan 7 05:28:04 postfix postfix/qmgr[31260]: E86E9D9F79: from=<[hidden email]>, size=11140, nrcpt=1 (queue active) Jan 7 08:56:31 postfix postfix/qmgr[21787]: E86E9D9F79: from=<[hidden email]>, size=11140, nrcpt=1 (queue active) Jan 7 08:56:31 postfix postfix/qmgr[21787]: 485A39C9E8: from=<[hidden email]>, size=6136, nrcpt=1 (queue active) Jan 7 09:09:56 postfix postfix/qmgr[23046]: E86E9D9F79: from=<[hidden email]>, size=11140, nrcpt=1 (queue active) Jan 7 09:09:57 postfix postfix/qmgr[23046]: 485A39C9E8: from=<[hidden email]>, size=6136, nrcpt=1 (queue active) Jan 7 09:40:46 postfix poJan 7 22:02:57 postfix postfix/qmgr[26441]: 5B91F873F6: removed Jan 7 22:02:57 postfix postfix/smtp[27180]: 375DDD5923: to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=59, delay=61550, delays=17019/44435/96/0.17, dsn=2.0.0, status=sent (250 ok 1262894577 qp 12113) Jan 7 22:02:57 postfix postfix/qmgr[26441]: 375DDD5923: removed Jan 7 22:02:58 postfix postfix/smtp[27070]: 7F0F2943B3: to=<[hidden email]>, relay=a70.localpc2105.com[10.0.0.70]:25, conn_use=10, delay=73795, delays=29264/44481/50/0.21, dsn=2.0.0, status=sent (250 ok 1262894577 qp 23067) Jan 7 22:02:58 postfix postfix/qmgr[26441]: 7F0F2943B3: removed Jan 7 22:02:58 postfix postfix/smtp[27050]: 32BB182182: to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=48, delay=73799, delays=29268/44466/65/0.28, dsn=2.0.0, status=sent (250 ok 1262894578 qp 12121) Jan 7 22:02:58 postfix postfix/qmgr[26441]: 32BB182182: removed Jan 7 22:02:58 postfix postfix/smtp[26758]: 577D6C7F7D: to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=60, delay=68451, delays=23920/44481/50/0.29, dsn=2.0.0, status=sent (250 ok 1262894578 qp 12122) Jan 7 22:02:58 postfix postfix/qmgr[26441]: 577D6C7F7D: removed Jan 7 22:02:58 postfix postfix/smtp[26935]: CDCE074F53: to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=49, delay=104597, delays=60065/44421/110/0.3, dsn=2.0.0, status=sent (250 ok 1262894578 qp 12135) Jan 7 22:02:58 postfix postfix/qmgr[26441]: CDCE074F53: removed Jan 7 22:02:58 postfix postfix/smtp[26708]: 4B0B6E77FD: to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=61, delay=46137, delays=1606/44461/70/0.31, dsn=2.0.0, status=sent (250 ok 1262894578 qp 12136) Jan 7 22:02:58 postfix postfix/qmgr[26441]: 4B0B6E77FD: removed Jan 7 22:02:58 postfix postfix/smtp[26794]: D2CB5DC84C: to=<[hidden email]>, relay=a70.localpc2105.com[10.0.0.70]:25, conn_use=11, delay=58160, delays=13628/44481/50/0.23, dsn=2.0.0, status=sent (250 ok 1262894578 qp 23076) Jan 7 22:02:58 postfix postfix/qmgr[26441]: D2CB5DC84C: removed Jan 7 22:02:58 postfix postfix/smtp[26968]: 1A651E17E0: to=<[hidden email]>, relay=a74.localpc2105.com[10.0.0.74]:25, conn_use=2, delay=54426, delays=9894/44462/69/0.27, dsn=2.0.0, status=sent (250 ok 1262894578 qp 7411) Jan 7 22:02:58 postfix postfix/qmgr[26441]: 1A651E17E0: removed Jan 7 22:02:58 postfix postfix/smtp[27037]: 4CCC486B55: to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=50, delay=45538, delays=1005/44407/125/0.17, dsn=2.0.0, status=sent (250 ok 1262894578 qp 12150) Jan 7 22:02:58 postfix postfix/qmgr[26441]: 4CCC486B55: removed Jan 7 22:02:58 postfix postfix/smtp[27188]: D130997201: to=<[hidden email]>, relay=a74.localpc2105.com[10.0.0.74]:25, conn_use=2, delay=71536, delays=27004/44448/84/0.28, dsn=2.0.0, status=sent (250 ok 1262894578 qp 7412) Jan 7 22:02:58 postfix postfix/qmgr[26441]: D130997201: removed Jan 7 22:02:59 postfix postfix/smtp[27033]: 6BD743906A: to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=62, delay=81473, delays=36941/44467/65/0.24, dsn=2.0.0, status=sent (250 ok 1262894579 qp 12157) Jan 7 22:02:59 postfix postfix/qmgr[26441]: 6BD743906A: removed Jan 7 22:02:59 postfix postfix/smtp[26793]: 84947C14B2: to=<[hidden email]>, relay=a70.localpc2105.com[10.0.0.70]:25, conn_use=12, delay=69401, delays=24868/44469/63/0.2, dsn=2.0.0, status=sent (250 ok 1262894578 qp 23084) Jan 7 22:02:59 postfix postfix/qmgr[26441]: 84947C14B2: removed Jan 7 22:02:59 postfix postfix/smtp[26737]: 6023552F52: to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=51, delay=96132, delays=51599/44448/84/0.3, dsn=2.0.0, status=sent (250 ok 1262894579 qp 12158) Jan 7 22:02:59 postfix postfix/qmgr[26441]: 6023552F52: removed Jan 7 22:02:59 postfix postfix/smtp[27134]: connect to a132.localpc2105.com[10.0.0.132]:25: Connection timed out Jan 7 22:02:59 postfix postfix/smtp[26717]: 96A447C426: to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=63, delay=103800, delays=59267/44433/99/0.27, dsn=2.0.0, status=sent (250 ok 1262894579 qp 12166) Jan 7 22:02:59 postfix postfix/qmgr[26441]: 96A447C426: removed Jan 7 22:02:59 postfix postfix/smtp[26727]: connect to a133.localpc2105.com[10.0.0.133]:25: Connection timed out Jan 7 22:02:59 postfix postfix/smtp[27204]: 485A39C9E8: to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=52, delay=69946, delays=25414/44492/40/0.29, dsn=2.0.0, status=sent (250 ok 1262894579 qp 12172) Jan 7 22:02:59 postfix postfix/qmgr[26441]: 485A39C9E8: removed Jan 7 22:02:59 postfix postfix/smtp[26727]: E86E9D9F79: to=<[hidden email]>, relay=a67.localpc2105.com[10.0.0.67]:25, delay=59696, delays=15163/44433/100/0.17, dsn=2.0.0, status=sent (250 ok 1262894579 qp 3991) Jan 7 22:02:59 postfix postfix/qmgr[26441]: E86E9D9F79: removed Jan 7 22:02:59 postfix postfix/smtp[26924]: C6905ACBA3: to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=64, delay=96901, delays=52368/44474/58/0.24, dsn=2.0.0, status=sent (250 ok 1262894579 qp 12179) Jan 7 22:02:59 postfix postfix/qmgr[26441]: C6905ACBA3: removed stfix/qmgr[26441]: E86E9D9F79: from=<[hidden email]>, size=11140, nrcpt=1 (queue active) Jan 7 09:40:47 postfix postfix/qmgr[26441]: 485A39C9E8: from=<[hidden email]>, size=6136, nrcpt=1 (queue active) Jan 7 22:02:59 postfix postfix/smtp[27204]: 485A39C9E8: to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=52, delay=69946, delays=25414/44492/40/0.29, dsn=2.0.0, status=sent (250 ok 1262894579 qp 12172) Jan 7 22:02:59 postfix postfix/qmgr[26441]: 485A39C9E8: removed Jan 7 22:02:59 postfix postfix/smtp[26727]: E86E9D9F79: to=<[hidden email]>, relay=a67.localpc2105.com[10.0.0.67]:25, delay=59696, delays=15163/44433/100/0.17, dsn=2.0.0, status=sent (250 ok 1262894579 qp 3991) Jan 7 22:02:59 postfix postfix/qmgr[26441]: E86E9D9F79: removed It is very strange. Patrick |
On Fri, Jan 08, 2010 at 12:30:34AM +0200, Patrick Chemla wrote:
> Jan 7 22:02:57 postfix postfix/qmgr[26441]: 5B91F873F6: removed > Jan 7 22:02:57 postfix postfix/smtp[27180]: 375DDD5923: > to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, > conn_use=59, delay=61550, delays=17019/44435/96/0.17, dsn=2.0.0, > status=sent (250 ok 1262894577 qp 12113) This recipient does not match the destination that is clogging the queue. Is the queue clogged with postmaster notices. I never enable any postmaster notices, they don't scale. notify_classes = This said, the 96 seconds of connection setup latency is an obvious and severe problem. Why on earth does it take 96 seconds to complete a HELO handshake with "a139.localpcc2105.com"? You are not going to get much mail out if each delivery takes 96 seconds... Is your Postfix server's IP address resolvable on the qmail systems? Are they doing some sort of pre-banner delay? ... > Jan 7 22:02:58 postfix postfix/smtp[27070]: 7F0F2943B3: > to=<[hidden email]>, relay=a70.localpc2105.com[10.0.0.70]:25, > conn_use=10, delay=73795, delays=29264/44481/50/0.21, dsn=2.0.0, > status=sent (250 ok 1262894577 qp 23067) Once again, 50 seconds is severely crippled. > Jan 7 22:02:58 postfix postfix/smtp[27050]: 32BB182182: > to=<[hidden email]>, > relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=48, delay=73799, > delays=29268/44466/65/0.28, dsn=2.0.0, status=sent (250 ok 1262894578 qp > 12121) This is enough. Fix this. Where are the deliveries to the clogged destination??? -- Viktor. Disclaimer: off-list followups get on-list replies or get ignored. Please do not ignore the "Reply-To" header. To unsubscribe from the postfix-users list, visit http://www.postfix.org/lists.html or click the link below: <mailto:[hidden email]?body=unsubscribe%20postfix-users> If my response solves your problem, the best way to thank me is to not send an "it worked, thanks" follow-up. If you must respond, please put "It worked, thanks" in the "Subject" so I can delete these quickly. |
In reply to this post by Patrick Chemla
Patrick Chemla:
> >> But the CPU of the box is idle more than 80%. It is clear that it is not a > >> matter of CPU, nor memory, nor disk. Something in the number of > >> processes/users/simultaneous tasks is blocking. Indeed, the symptom of blocking is in the third field of the Postfix "delays" logging. The format of the "delays=a/b/c/d" logging is as follows: o a = time from message arrival to last active queue entry o b = time from last active queue entry to connection setup o c = time in connection setup, including DNS, EHLO and TLS o d = time in message transmission In your case, it takes a minute or more to set up the connection including DNS lookup and EHLO handshake. That is holding up your mail. - Check if the qmail servers are responsive (telnet hostname 25). - Check if your Postfix needs a /var/spool/postfix/etc/resolv.conf file, and if that file is consistent with /etc/resolv.conf. If Postfix needs /var/spool/postfix/etc/resolv.conf and the file is missong or contains a bogus server that will add time to your deliveries. - If they aren't, increase the concurrency on the qmail side. Wietse |
Le 08/01/2010 03:03, Wietse Venema a écrit :
> Patrick Chemla: > >>>> But the CPU of the box is idle more than 80%. It is clear that it is not a >>>> matter of CPU, nor memory, nor disk. Something in the number of >>>> processes/users/simultaneous tasks is blocking. >>>> > Indeed, the symptom of blocking is in the third field of > the Postfix "delays" logging. > > The format of the "delays=a/b/c/d" logging is as follows: > > o a = time from message arrival to last active queue entry > > o b = time from last active queue entry to connection setup > > o c = time in connection setup, including DNS, EHLO and TLS > > o d = time in message transmission > > In your case, it takes a minute or more to set up the connection > including DNS lookup and EHLO handshake. That is holding up your mail. > > - Check if the qmail servers are responsive (telnet hostname 25). > > now, but the connexion is still very slow. I saw this morning c=285. > - Check if your Postfix needs a /var/spool/postfix/etc/resolv.conf > file, and if that file is consistent with /etc/resolv.conf. If > Postfix needs /var/spool/postfix/etc/resolv.conf and the file > is missong or contains a bogus server that will add time to > your deliveries. > > Hi Wietse, How do I know if " Postfix needs a /var/spool/postfix/etc/resolv.conf" directory /var/spool/postfix/etc doesn't exist. > - If they aren't, increase the concurrency on the qmail side. > > conccurency =100. It's already a large number. I can increase it. > Wietse > Thanks Patrick |
In reply to this post by Victor Duchovni
Le 08/01/2010 00:43, Victor Duchovni a écrit :
> On Fri, Jan 08, 2010 at 12:30:34AM +0200, Patrick Chemla wrote: > > >> Jan 7 22:02:57 postfix postfix/qmgr[26441]: 5B91F873F6: removed >> Jan 7 22:02:57 postfix postfix/smtp[27180]: 375DDD5923: >> to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, >> conn_use=59, delay=61550, delays=17019/44435/96/0.17, dsn=2.0.0, >> status=sent (250 ok 1262894577 qp 12113) >> > This recipient does not match the destination that is clogging the > queue. Is the queue clogged with postmaster notices. I never enable > any postmaster notices, they don't scale. > > notify_classes = > > > This said, the 96 seconds of connection setup latency is an obvious and > severe problem. Why on earth does it take 96 seconds to complete a HELO > handshake with "a139.localpcc2105.com"? You are not going to get much > mail out if each delivery takes 96 seconds... > > Is your Postfix server's IP address resolvable on the qmail systems? > Should it be? qmail accept all RELAY CLIENT from local network. > Are they doing some sort of pre-banner delay? ... > > > When I do telnet a139.localpc2105.com 25, I get immediate response. >> Jan 7 22:02:58 postfix postfix/smtp[27070]: 7F0F2943B3: >> to=<[hidden email]>, relay=a70.localpc2105.com[10.0.0.70]:25, >> conn_use=10, delay=73795, delays=29264/44481/50/0.21, dsn=2.0.0, >> status=sent (250 ok 1262894577 qp 23067) >> > Once again, 50 seconds is severely crippled. > > When I telnet a70.localpc2105.com 25 I get an immediate response. I have checked my local DNS. There were some troubles, and I made some improvements. I have now 2 local caching DNS respawning fast. All qmail servers addresses are in the postfix /etc/hosts to avoid Ip lookup. I have checked qmails servers, nothing has changed since they were able to have a queue of 200,000 messages, but they have now a few hundreds only. I have calculated average times to complete HELO. All qmails are in the same kind of value around 2 minutes. Not any one is better than others. Again, each was handling a queue of hundreds thousands before I set up the postfix relay to load balance. I really don't have a clue. I don't know where to look. >> Jan 7 22:02:58 postfix postfix/smtp[27050]: 32BB182182: >> to=<[hidden email]>, >> relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=48, delay=73799, >> delays=29268/44466/65/0.28, dsn=2.0.0, status=sent (250 ok 1262894578 qp >> 12121) >> > This is enough. Fix this. > > How I can fix it if it works fine through telnet? > Where are the deliveries to the clogged destination??? > > Sorry, I don't understand this question. Please be clear. Patrick |
Patrick Chemla:
[ Charset ISO-8859-1 unsupported, converting... ] > Le 08/01/2010 00:43, Victor Duchovni a ?crit : > > On Fri, Jan 08, 2010 at 12:30:34AM +0200, Patrick Chemla wrote: > > > > > >> Jan 7 22:02:57 postfix postfix/qmgr[26441]: 5B91F873F6: removed > >> Jan 7 22:02:57 postfix postfix/smtp[27180]: 375DDD5923: > >> to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, > >> conn_use=59, delay=61550, delays=17019/44435/96/0.17, dsn=2.0.0, ^^^^^^^^^^^ Note that this connection has been reused multiple times (see below for what this means in Postfix). Why does it take 69 seconds to initialize a reused SMTP connection? What happens when you set smtp_connection_cache_on_demand=no in main.cf (and do "postfix reload")? If this makes a difference, then a) you have a problem with smtp<->scache communication. b) qmail does not like RSET commands c) Your machine is running low on memory and swapping out the scache process. d) something else. Wietse Under high load, smtp(8) processes give their open connections to scache(8). Later, they ask scache(8) for an open connection to a specific destination. Once an smtp(8) client retrieves an open connection, it sends RSET tothe remote server and waits for a 250 reply (i.e. the server is still happy). According to the logfile record this lookup/rset/reply sequence is taking 96 seconds. |
In reply to this post by Patrick Chemla
On Fri, 08 Jan 2010 15:24:25 +0200, Patrick Chemla > When I telnet a70.localpc2105.com 25 I get an immediate response. I assume you are telnet'ing from the Postfix server with the queue delay problem. At this point, after you receive the 220, type: ehlo your.postfix-server.tld <enter> and time the delay of the 250 responses. Continue to do a complete manual mail transaction through telnet, and time each smtp command completion (wall clock is fine). Post the results here please. -- Stan |
In reply to this post by Wietse Venema
Wietse Venema:
> Patrick Chemla: > > Le 08/01/2010 00:43, Victor Duchovni a ?crit : > > > On Fri, Jan 08, 2010 at 12:30:34AM +0200, Patrick Chemla wrote: > > > > > > > > >> Jan 7 22:02:57 postfix postfix/qmgr[26441]: 5B91F873F6: removed > > >> Jan 7 22:02:57 postfix postfix/smtp[27180]: 375DDD5923: > > >> to=<[hidden email]>, relay=a139.localpc2105.com[10.0.0.139]:25, > > >> conn_use=59, delay=61550, delays=17019/44435/96/0.17, dsn=2.0.0, > ^^^^^^^^^^^ > > Note that this connection has been reused multiple times (see below > for what this means in Postfix). > > Why does it take 69 seconds to initialize a reused SMTP connection? > > What happens when you set > > smtp_connection_cache_on_demand=no > > in main.cf (and do "postfix reload")? > > If this makes a difference, then Check your qmail configuration for "tarpit" options. There exist qmail patches that will slow down the qmail SMTP server when the client sends lots of email, or lots of recipients. Wietse > a) you have a problem with smtp<->scache communication. > > b) qmail does not like RSET commands > > c) Your machine is running low on memory and swapping out the scache > process. > > d) something else. > > Wietse > > Under high load, smtp(8) processes give their open connections to > scache(8). Later, they ask scache(8) for an open connection to a > specific destination. Once an smtp(8) client retrieves an open > connection, it sends RSET tothe remote server and waits for a 250 > reply (i.e. the server is still happy). According to the logfile > record this lookup/rset/reply sequence is taking 96 seconds. > > |
Free forum by Nabble | Edit this page |