Significant relay delays

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Significant relay delays

Alex Regan
Hi,

I have been using an older version of postfix on a relay server for
quite a few years now, without any real incident. It accepts mail from
one or two other servers and forwards it on to an internal Exchange
server on the same network. It handles about 250k messages per day.
It's configured with dual instances.

It seems for the last few months there is an increasing delay in
delivery times and I can't explain why. I suspect something on the
Exchange side because nothing has changed on the postfix server. The
administrators of the Exchange box aren't able to provide any ideas
either. I'm also pretty sure it's not a network issue. After passing
billions of packets there isn't a single error. I'm also pretty sure
DNS is configured properly.

I'm seeing occasions where there will be a constant 50 messages in the
second instance, and as many as 500 at times. The 500 messages may sit
there for a half-hour, and then all of the sudden they are delivered.
However, there remains a constant 50 in the queue with status info
like "conversation timed out while sending end of data -- message may
be sent more than once" or "Error: timeout exceeded (in reply to end
of DATA command)".

The messages may sit in the queue for even a few weeks, and I assume
are eventually delivered.

In my mail log, I see info like the following:

Aug 20 01:08:12 bocmailrelay POSTFIX_F/smtp[1186]: C638B1A8008: to=<marie
[hidden email]>, relay=mail.example.com[xxx.yyy.zzz.3], delay=625109, st
atus=deferred (conversation with mail.example.com[xxx.yyy.zzz.3] timed out
while sending end of data -- message may be sent more than once)

I'm having difficulty discerning messages entering the second queue
(with delay=0, typically) and messages being
queued because they couldn't immediately be delivered. Is there an
easier way to establish which messages are
being queued because they couldn't easily be delivered?

I thought I would try "debug_peer_list" and increase logging to try
and get information on delays from a specific domain, but I'm not sure
that is what this variable is used for. Is there another way to
increase logging either for a specific domain or for this problem to
better troubleshoot it?

Thanks,
Alex Hayes
Reply | Threaded
Open this post in threaded view
|

Re: Significant relay delays

Olivier Nicole
Hi,

This is just a wild guess...

> I'm also pretty sure it's not a network issue. After passing
> billions of packets there isn't a single error. I'm also pretty sure
> DNS is configured properly.

Have you checked the connection between postfix and the exchange
machines? After some years, a cable can get bad, lousy, and the
packets would not pass so reliably anymore. After moving a
machine/wandering around a rack cabinet, one may have step on a cable
and disconnect it or damage it.

Bests,

Olivier
Reply | Threaded
Open this post in threaded view
|

Re: Significant relay delays

Phill Macey
In reply to this post by Alex Regan
Sorry in advance for the top posting or whatever gmail does on mobile
phones - i have no control over that. I bumped into a very similar
problem today. Mail was queuing up on one of our servers with exactly
the same messages as what you had. In our case a perl script on the
postfix server had gone crazy and started consuming all the memory and
swap space on the machine. Once that was fixed, the errors cleared up
and the mail queue emptied itself. HTH

On 8/21/09, MySQL Student <[hidden email]> wrote:

> Hi,
>
> I have been using an older version of postfix on a relay server for
> quite a few years now, without any real incident. It accepts mail from
> one or two other servers and forwards it on to an internal Exchange
> server on the same network. It handles about 250k messages per day.
> It's configured with dual instances.
>
> It seems for the last few months there is an increasing delay in
> delivery times and I can't explain why. I suspect something on the
> Exchange side because nothing has changed on the postfix server. The
> administrators of the Exchange box aren't able to provide any ideas
> either. I'm also pretty sure it's not a network issue. After passing
> billions of packets there isn't a single error. I'm also pretty sure
> DNS is configured properly.
>
> I'm seeing occasions where there will be a constant 50 messages in the
> second instance, and as many as 500 at times. The 500 messages may sit
> there for a half-hour, and then all of the sudden they are delivered.
> However, there remains a constant 50 in the queue with status info
> like "conversation timed out while sending end of data -- message may
> be sent more than once" or "Error: timeout exceeded (in reply to end
> of DATA command)".
>
> The messages may sit in the queue for even a few weeks, and I assume
> are eventually delivered.
>
> In my mail log, I see info like the following:
>
> Aug 20 01:08:12 bocmailrelay POSTFIX_F/smtp[1186]: C638B1A8008: to=<marie
> [hidden email]>, relay=mail.example.com[xxx.yyy.zzz.3], delay=625109, st
> atus=deferred (conversation with mail.example.com[xxx.yyy.zzz.3] timed out
> while sending end of data -- message may be sent more than once)
>
> I'm having difficulty discerning messages entering the second queue
> (with delay=0, typically) and messages being
> queued because they couldn't immediately be delivered. Is there an
> easier way to establish which messages are
> being queued because they couldn't easily be delivered?
>
> I thought I would try "debug_peer_list" and increase logging to try
> and get information on delays from a specific domain, but I'm not sure
> that is what this variable is used for. Is there another way to
> increase logging either for a specific domain or for this problem to
> better troubleshoot it?
>
> Thanks,
> Alex Hayes
>

--
Sent from Gmail for mobile | mobile.google.com

Phill

----------------------------------------------
There's no such thing as a stupid question,
but they're the easiest to answer!

At the end of the game, the king and the pawn go back in the same box.

I souport publik edukashun

"Build a man a fire, and he'll be warm for a day. Set a man on fire,
and he'll be warm for the rest of his life."
Terry Pratchett
Reply | Threaded
Open this post in threaded view
|

Re: Significant relay delays

Alex Regan
Hi,

> problem today. Mail was queuing up on one of our servers with exactly
> the same messages as what you had. In our case a perl script on the
> postfix server had gone crazy and started consuming all the memory and
> swap space on the machine. Once that was fixed, the errors cleared up
> and the mail queue emptied itself. HTH

I don't think it's a memory or lack of available CPU resources causing
this, as the server just routes mail, and is typically pretty idle.

I'd sure welcome some additional ideas to troubleshoot.

Thanks,
Alex
Reply | Threaded
Open this post in threaded view
|

Re: Significant relay delays

Alex Regan
In reply to this post by Olivier Nicole
Hi,

>> I'm also pretty sure it's not a network issue. After passing
>> billions of packets there isn't a single error. I'm also pretty sure
>> DNS is configured properly.
>
> Have you checked the connection between postfix and the exchange
> machines? After some years, a cable can get bad, lousy, and the
> packets would not pass so reliably anymore. After moving a
> machine/wandering around a rack cabinet, one may have step on a cable
> and disconnect it or damage it.

I had them replace the network cable, to no avail.

How can I add some additional debugging, without overwhelming the
system, to troubleshoot this further? Is there a way to increase the
debugging info for messages in the queue, such as the last time an
attempt was made to deliver the message, or the timeline of what was
happening during the failed delivery attempt?

Thanks,
Alex
Reply | Threaded
Open this post in threaded view
|

Re: Significant relay delays

Noel Jones-2
On 8/23/2009 7:01 PM, MySQL Student wrote:

> Hi,
>
>>> I'm also pretty sure it's not a network issue. After passing
>>> billions of packets there isn't a single error. I'm also pretty sure
>>> DNS is configured properly.
>>
>> Have you checked the connection between postfix and the exchange
>> machines? After some years, a cable can get bad, lousy, and the
>> packets would not pass so reliably anymore. After moving a
>> machine/wandering around a rack cabinet, one may have step on a cable
>> and disconnect it or damage it.
>
> I had them replace the network cable, to no avail.
>
> How can I add some additional debugging, without overwhelming the
> system, to troubleshoot this further? Is there a way to increase the
> debugging info for messages in the queue, such as the last time an
> attempt was made to deliver the message, or the timeline of what was
> happening during the failed delivery attempt?
>
> Thanks,
> Alex

All the information you need is likely already in the logs.

You can make the logs a little easier to read by marking
different services with different names.  This is particularly
handy if you have multiple instances, but it can also be
helpful to label the after content_filter smtpd.
http://www.postfix.org/postconf.5.html#syslog_name

Here's some pointers on what to look for in the log:
http://www.postfix.org/QSHAPE_README.html
these may be helpful also:
http://www.postfix.org/DEBUG_README.html
http://www.postfix.org/TUNING_README.html


   -- Noel Jones