Slow delivery of mails reg.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Slow delivery of mails reg.

Anant Athavale

Hi,

We have a internal relay configured in our setup.  The relay host had stopped accepting mails as the system was hanging and we could resolve it only after about 8 hrs.  Meanwhile the queue built up and there were about 80,000 mails in queue to relay to the relay host.  Once the system was made up, mails started delivering to the relay host at a very slow rate. It took almost 10 hours to clear the queue.

What I could understand by googling was, postfix detected the relay host to be dead and due to the parameters like minimal_backoff_time and maximal_backoff_time, the delivery rate to the relay host was slow. 

I am not sure, whether this understanding is correct. 

What is the correct way to come out of this situation and deliver the mails faster after recovery of relay host.

Waiting for your valuable suggestions.

Anant S Athavale



गोपनीयता नोटिसः यह ई-मेल संदेश, किसी भी संलग्नक के साथ, अभिप्रेत प्राप्तकार(रों) के एकमात्र उपयोग के लिए है और इसमें गोपनीयता और विशेषाधिकार सूचना होगी। किसी प्रकार का अप्राधिकृत पुनरीक्षण, उपयोग, खुलासा या परिचालन निषेध है। यदि आप अभिप्रेत प्राप्तकार नहीं हैं तो, कृपया भेजने वाले से उत्तर ई-मेल द्वारा संपर्क करें और मूल संदेश की सभी प्रतियों को नष्ट कर दें।


Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.


Reply | Threaded
Open this post in threaded view
|

Re: Slow delivery of mails reg.

Viktor Dukhovni
On Wed, Nov 29, 2017 at 09:20:21AM +0530, [hidden email] wrote:

> We have a internal relay configured in our setup.  The relay host hadstopped
> accepting mails as the system was hanging and we could resolve
> it only after about 8 hrs.  Meanwhile the queue built up and there
> were about 80,000 mails in queue to relay to the relay host.  Once the
> system was made up, mails started delivering to the relay host at a
> very slow rate. It took almost 10 hours to clear the queue.

This is almost certainly because the relay host could not process
the resumed mail flow faster.  Another possibility is that there
are residual DNS issues that slow down each delivery.

> What I could understand by googling was, postfix detected the relay
> host to be dead and due to the parameters like minimal_backoff_time
> and maximal_backoff_time, the delivery rate to the relay host was slow. 

This is wrong.  The dead host detection state is cached only for a short
time, and cannot account for slow delivery once the host is back up.

> What is the correct way to come out of this situation and deliver the
> mails faster after recovery of relay host.

Make sure the relay host can process email quickly, find and resolve
all DNS-related issues.

To understand why delivery was slow you need to analyze your logs
and examine the "delays=" log entries to see whether the delays
in connecting to the relay host or waiting for deliveries to
complete.

--
        Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: Slow delivery of mails reg.

Anant Athavale

----- Message from Viktor Dukhovni <[hidden email]> ---------
    Date: Wed, 29 Nov 2017 04:47:35 +0000
    From: Viktor Dukhovni <[hidden email]>
Reply-To: [hidden email]
Subject: Re: Slow delivery of mails reg.
      To: [hidden email]

On Wed, Nov 29, 2017 at 09:20:21AM +0530, [hidden email] wrote:

We have a internal relay configured in our setup.  The relay host hadstopped
accepting mails as the system was hanging and we could resolve
it only after about 8 hrs.  Meanwhile the queue built up and there
were about 80,000 mails in queue to relay to the relay host.  Once the
system was made up, mails started delivering to the relay host at a
very slow rate. It took almost 10 hours to clear the queue.

This is almost certainly because the relay host could not process
the resumed mail flow faster.  Another possibility is that there
are residual DNS issues that slow down each delivery.

What I could understand by googling was, postfix detected the relay
host to be dead and due to the parameters like minimal_backoff_time
and maximal_backoff_time, the delivery rate to the relay host was slow. 

This is wrong.  The dead host detection state is cached only for a short
time, and cannot account for slow delivery once the host is back up.

What is the correct way to come out of this situation and deliver the
mails faster after recovery of relay host.

Make sure the relay host can process email quickly, find and resolve
all DNS-related issues.

To understand why delivery was slow you need to analyze your logs
and examine the "delays=" log entries to see whether the delays
in connecting to the relay host or waiting for deliveries to
complete.




--        Viktor.

The log entries with delays is as below.

relay=x.x.x.x[x.x.x.x]:25, conn_use=3, delay=32027, delays=31986/31/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=5, delay=27740, delays=27690/40/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=6, delay=13069, delays=13010/49/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=7, delay=21476, delays=21406/60/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=2, delay=33553, delays=33474/69/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=12, delay=14591, delays=14493/88/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=4, delay=25953, delays=25807/136/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=6, delay=27339, delays=25970/1359/0/10,
relay=x.x.x.x[x.x.x.x]:25,conn_use=16, delay=7608, delays=4510/3088/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=19, delay=7457, delays=2593/4854/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=20, delay=38734, delays=33864/4860/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=7, delay=11656, delays=5728/5918/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=19, delay=40339, delays=31320/9009/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=13, delay=10820, delays=441/10369/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=10, delay=39517, delays=29103/10404/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=28, delay=43790, delays=33444/10336/0/10

the above log entries are at different times on the same day.

Please share your analysis for further improving our setup.

Regadrs,
ANANT.


----- End message from Viktor Dukhovni <[hidden email]> -----

Anant S Athavale



गोपनीयता नोटिसः यह ई-मेल संदेश, किसी भी संलग्नक के साथ, अभिप्रेत प्राप्तकार(रों) के एकमात्र उपयोग के लिए है और इसमें गोपनीयता और विशेषाधिकार सूचना होगी। किसी प्रकार का अप्राधिकृत पुनरीक्षण, उपयोग, खुलासा या परिचालन निषेध है। यदि आप अभिप्रेत प्राप्तकार नहीं हैं तो, कृपया भेजने वाले से उत्तर ई-मेल द्वारा संपर्क करें और मूल संदेश की सभी प्रतियों को नष्ट कर दें।


Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.


Reply | Threaded
Open this post in threaded view
|

Re: Slow delivery of mails reg.

Viktor Dukhovni
On Wed, Nov 29, 2017 at 12:04:14PM +0530, [hidden email] wrote:

> > To understand why delivery was slow you need to analyze your logs
> > and examine the "delays=" log entries to see whether the delays
> > in connecting to the relay host or waiting for deliveries to
> > complete.
> >
> > --        Viktor.
>
> The log entries with delays is as below.
>
> relay=x.x.x.x[x.x.x.x]:25, conn_use=3, delay=32027, delays=31986/31/0/10

Well, the main issue here is the consistent 10s delay to complete
a delivery after 0s connection latency.  I would guess that's some
sort of timeout on the relay end resolving the IP address of the
connecting client.  With 10s to complete each delivery, your
throughput is not surprisingly quite low.

> relay=x.x.x.x[x.x.x.x]:25,conn_use=5, delay=27740, delays=27690/40/0/10
> relay=x.x.x.x[x.x.x.x]:25,conn_use=6, delay=13069, delays=13010/49/0/10
> relay=x.x.x.x[x.x.x.x]:25,conn_use=7, delay=21476, delays=21406/60/0/10
> relay=x.x.x.x[x.x.x.x]:25,conn_use=2, delay=33553, delays=33474/69/0/10
> relay=x.x.x.x[x.x.x.x]:25,conn_use=12, delay=14591, delays=14493/88/0/10
> relay=x.x.x.x[x.x.x.x]:25,conn_use=4, delay=25953, delays=25807/136/0/10
> relay=x.x.x.x[x.x.x.x]:25,conn_use=6, delay=27339, delays=25970/1359/0/10,
> relay=x.x.x.x[x.x.x.x]:25,conn_use=16, delay=7608, delays=4510/3088/0/10
> relay=x.x.x.x[x.x.x.x]:25,conn_use=19, delay=7457, delays=2593/4854/0/10
> relay=x.x.x.x[x.x.x.x]:25,conn_use=20, delay=38734, delays=33864/4860/0/10
> relay=x.x.x.x[x.x.x.x]:25,conn_use=7, delay=11656, delays=5728/5918/0/10
> relay=x.x.x.x[x.x.x.x]:25,conn_use=19, delay=40339, delays=31320/9009/0/10
> relay=x.x.x.x[x.x.x.x]:25,conn_use=13, delay=10820, delays=441/10369/0/10
> relay=x.x.x.x[x.x.x.x]:25,conn_use=10, delay=39517, delays=29103/10404/0/10
> relay=x.x.x.x[x.x.x.x]:25,conn_use=28, delay=43790, delays=33444/10336/0/10
>
> the above log entries are at different times on the same day.
>
> Please share your analysis for further improving our setup.

Fix the downstream delivery latency.  The relay is either having
trouble with DNS resolution or is actively misconfigured to impose
an artificial delay on client connections, perhaps some sort of
anti-spam measure that could make some sense on a host getting
mail directly from the Internet, but is a bad idea for internal
relays.

--
        Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: Slow delivery of mails reg.

Anant Athavale

----- Message from Viktor Dukhovni <[hidden email]> ---------
    Date: Wed, 29 Nov 2017 07:08:13 +0000
    From: Viktor Dukhovni <[hidden email]>
Reply-To: [hidden email]
Subject: Re: Slow delivery of mails reg.
      To: [hidden email]

On Wed, Nov 29, 2017 at 12:04:14PM +0530, [hidden email] wrote:

To understand why delivery was slow you need to analyze your logs
and examine the "delays=" log entries to see whether the delays
in connecting to the relay host or waiting for deliveries to
complete.

--        Viktor.

The log entries with delays is as below.

relay=x.x.x.x[x.x.x.x]:25, conn_use=3, delay=32027, delays=31986/31/0/10

Well, the main issue here is the consistent 10s delay to complete
a delivery after 0s connection latency.  I would guess that's some
sort of timeout on the relay end resolving the IP address of the
connecting client.  With 10s to complete each delivery, your
throughput is not surprisingly quite low.

relay=x.x.x.x[x.x.x.x]:25,conn_use=5, delay=27740, delays=27690/40/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=6, delay=13069, delays=13010/49/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=7, delay=21476, delays=21406/60/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=2, delay=33553, delays=33474/69/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=12, delay=14591, delays=14493/88/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=4, delay=25953, delays=25807/136/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=6, delay=27339, delays=25970/1359/0/10,
relay=x.x.x.x[x.x.x.x]:25,conn_use=16, delay=7608, delays=4510/3088/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=19, delay=7457, delays=2593/4854/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=20, delay=38734, delays=33864/4860/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=7, delay=11656, delays=5728/5918/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=19, delay=40339, delays=31320/9009/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=13, delay=10820, delays=441/10369/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=10, delay=39517, delays=29103/10404/0/10
relay=x.x.x.x[x.x.x.x]:25,conn_use=28, delay=43790, delays=33444/10336/0/10

the above log entries are at different times on the same day.

Please share your analysis for further improving our setup.

Fix the downstream delivery latency.  The relay is either having
trouble with DNS resolution or is actively misconfigured to impose
an artificial delay on client connections, perhaps some sort of
anti-spam measure that could make some sense on a host getting
mail directly from the Internet, but is a bad idea for internal
relays.

--        Viktor.


Thank you so much for the debugging.  We have a PIX sitting between client and relay host and what I could see as part of default configuration that, smtp_pix_workaround_delay_time is 10s.  Could PIX be a factor in delivery latency?

I don't see 10s delay during normal operations, but I do see "enabling PIX workarounds: disable_esmtp delay_dotcrlf for" lines in the logs for the relay in question.

Do you think, PIX could be the reason?

Regards,
ANANT





----- End message from Viktor Dukhovni <[hidden email]> -----

Anant S Athavale



गोपनीयता नोटिसः यह ई-मेल संदेश, किसी भी संलग्नक के साथ, अभिप्रेत प्राप्तकार(रों) के एकमात्र उपयोग के लिए है और इसमें गोपनीयता और विशेषाधिकार सूचना होगी। किसी प्रकार का अप्राधिकृत पुनरीक्षण, उपयोग, खुलासा या परिचालन निषेध है। यदि आप अभिप्रेत प्राप्तकार नहीं हैं तो, कृपया भेजने वाले से उत्तर ई-मेल द्वारा संपर्क करें और मूल संदेश की सभी प्रतियों को नष्ट कर दें।


Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.


Reply | Threaded
Open this post in threaded view
|

Re: Slow delivery of mails reg.

Wietse Venema
[hidden email]:

> >> relay=x.x.x.x[x.x.x.x]:25,conn_use=5, delay=27740, delays=27690/40/0/10
> >> relay=x.x.x.x[x.x.x.x]:25,conn_use=6, delay=13069, delays=13010/49/0/10
> >> relay=x.x.x.x[x.x.x.x]:25,conn_use=7, delay=21476, delays=21406/60/0/10
> >> relay=x.x.x.x[x.x.x.x]:25,conn_use=2, delay=33553, delays=33474/69/0/10
> >> relay=x.x.x.x[x.x.x.x]:25,conn_use=12, delay=14591, delays=14493/88/0/10
> >> relay=x.x.x.x[x.x.x.x]:25,conn_use=4, delay=25953, delays=25807/136/0/10
> >> relay=x.x.x.x[x.x.x.x]:25,conn_use=6, delay=27339, delays=25970/1359/0/10,
> >> relay=x.x.x.x[x.x.x.x]:25,conn_use=16, delay=7608, delays=4510/3088/0/10
> >> relay=x.x.x.x[x.x.x.x]:25,conn_use=19, delay=7457, delays=2593/4854/0/10
> >> relay=x.x.x.x[x.x.x.x]:25,conn_use=20, delay=38734, delays=33864/4860/0/10
> >> relay=x.x.x.x[x.x.x.x]:25,conn_use=7, delay=11656, delays=5728/5918/0/10
> >> relay=x.x.x.x[x.x.x.x]:25,conn_use=19, delay=40339, delays=31320/9009/0/10
> >> relay=x.x.x.x[x.x.x.x]:25,conn_use=13, delay=10820, delays=441/10369/0/10
> >> relay=x.x.x.x[x.x.x.x]:25,conn_use=10, delay=39517, delays=29103/10404/0/10
> >> relay=x.x.x.x[x.x.x.x]:25,conn_use=28, delay=43790, delays=33444/10336/0/10
> >>
> >> the above log entries are at different times on the same day.
> >>
> >> Please share your analysis for further improving our setup.
> >
> > Fix the downstream delivery latency.? The relay is either having
> > trouble with DNS resolution or is actively misconfigured to impose
> > an artificial delay on client connections, perhaps some sort of
> > anti-spam measure that could make some sense on a host getting
> > mail directly from the Internet, but is a bad idea for internal
> > relays.
> >
> > --? ? ? ? Viktor.
>
> Thank you so much for the debugging.? We have a PIX sitting between  
> client and relay host and what I could see as part of default  
> configuration that, smtp_pix_workaround_delay_time is 10s.? Could PIX  
> be a factor in delivery latency?

Nope, PIX delays happen after message delivery. The 10s delay as logged
is concerned with the 220 greeting.

But of course, 10s delay after delivery also does not help. So get
that f..ing PIX disabled.

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: Slow delivery of mails reg.

Viktor Dukhovni
In reply to this post by Anant Athavale
On Wed, Nov 29, 2017 at 02:53:20PM +0530, [hidden email] wrote:

> > Well, the main issue here is the consistent 10s delay to complete
> > a delivery after 0s connection latency.  I would guess that's some
> > sort of timeout on the relay end resolving the IP address of the
> > connecting client.  With 10s to complete each delivery, your
> > throughput is not surprisingly quite low.
>
> Thank you so much for the debugging.  We have a PIX sitting between
> client and relay host and what I could see as part of default
> configuration that, smtp_pix_workaround_delay_time is 10s.  Could PIXbe a
> factor in delivery latency?

The PIX does not introduce a transaction delay.  Rather, Postfix
implements some PIX-specific work-arounds for that have been in
the queue long enough.  This is not related to your throughput
issue, but as Wietse says, do configure your PIX to not mess
with SMTP, its SMTP features are just harmful.

> I don't see 10s delay during normal operations, but I do see "enabling
> PIX workarounds: disable_esmtp delay_dotcrlf for" lines in the logs
> for the relay in question.
>
> Do you think, PIX could be the reason?

No.  I've already explained you have a downstream latency problem,
either DNS or misconfigured artificial delays.  An overloaded
system would have a more variable delay.  Exactly 10s for each
delivery must be a timer expiring, which could be a DNS timeout
or a deliberate pause.

--
        Viktor.