Mail Queue Stall Problem

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Mail Queue Stall Problem

Dennis Putnam-2
I occasionally encounter a strange problem with the mail queue seemingly
not retrying failed messages. The messages show in mailq as having timed
out and are a few days old. In the mean time other messages are going
through just fine. As soon as I run postqueue -f those messages go
through as well. I guess I can set up a cron to issue the postqueue -f
command but there is obviously something wrong. Can someone tell me how
I might trouble shoot this problem? Thanks.


signature.asc (203 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Mail Queue Stall Problem

Viktor Dukhovni


> On Jan 18, 2018, at 2:44 PM, Dennis Putnam <[hidden email]> wrote:
>
> I occasionally encounter a strange problem with the mail queue seemingly
> not retrying failed messages.

What is your definition of "not retrying"?  Messages are retried
periodically, with exponential backoff up to the maximal_backoff_time.
When enough messages for a given destination all encounter connection
failure, other messages in the active queue to the same destination
may be deferred without being retried (to avoid queue congestion).

> The messages show in mailq as having timed
> out and are a few days old.

They've probably maxed out the maximal_backoff_time and are tried
infrequently, but are not forgotten.

> In the mean time other messages are going through just fine.

That's normal.

> As soon as I run postqueue -f those messages go through as well.

That's also normal.

> I guess I can set up a cron to issue the postqueue -f command

Don't.

> but there is obviously something wrong.

Nothing in your message is evidence of of a problem.

> Can someone tell me how I might trouble shoot this problem?

First post evidence of a problem (logs and configuration details).

--
        Viktor.

Reply | Threaded
Open this post in threaded view
|

Re: Mail Queue Stall Problem

Dennis Putnam-2
Thanks for the reply. See embedded comments. Also note that all messages
go through the same destination server.

On 1/18/2018 2:55 PM, Viktor Dukhovni wrote:

>
>> On Jan 18, 2018, at 2:44 PM, Dennis Putnam <[hidden email]> wrote:
>>
>> I occasionally encounter a strange problem with the mail queue seemingly
>> not retrying failed messages.
> What is your definition of "not retrying"?  Messages are retried
> periodically, with exponential backoff up to the maximal_backoff_time.
> When enough messages for a given destination all encounter connection
> failure, other messages in the active queue to the same destination
> may be deferred without being retried (to avoid queue congestion).
Since they go immediately when I run postqueue -f I assumed they were
not really being retried.
>
>> The messages show in mailq as having timed
>> out and are a few days old.
> They've probably maxed out the maximal_backoff_time and are tried
> infrequently, but are not forgotten.
The default is 4000 seconds according to the docs. Since I don't
specifically set it in main.cf that is a little over an hour. In the 2-3
days they have been sitting in the queue there should have been numerous
retries. During which time other message were going through. Why did
they not time out as well? It is too coincidental that the destination
server times out every time just those messages are retried while others
do not.
>
>> In the mean time other messages are going through just fine.
> That's normal.
I wouldn't think so if the maximal_backoff_time has expired many times
during that period.
>
>> As soon as I run postqueue -f those messages go through as well.
> That's also normal.
Again, it is too coincidental to believe that they suddenly don't time
out when postqueue -f is run after all those days of retries.
>
>> I guess I can set up a cron to issue the postqueue -f command
> Don't.
I don't want to but I need to keep these messages moving.
>
>> but there is obviously something wrong.
> Nothing in your message is evidence of of a problem.
Other than the messages do not get sent until I run postqueue -f.
>
>> Can someone tell me how I might trouble shoot this problem?
> First post evidence of a problem (logs and configuration details).
>
Nothing of note in the logs ever shows up. All the retry timeout config
stuff is default. I don't set any of that in main.cf.


signature.asc (203 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Mail Queue Stall Problem

Viktor Dukhovni


> On Jan 18, 2018, at 3:26 PM, Dennis Putnam <[hidden email]> wrote:
>
>>> The messages show in mailq as having timed
>>> out and are a few days old.
>> They've probably maxed out the maximal_backoff_time and are tried
>> infrequently, but are not forgotten.
> The default is 4000 seconds according to the docs. Since I don't
> specifically set it in main.cf that is a little over an hour. In the 2-3
> days they have been sitting in the queue there should have been numerous
> retries. During which time other message were going through. Why did
> they not time out as well? It is too coincidental that the destination
> server times out every time just those messages are retried while others
> do not.

Are any such messages still present?  What is the mtime of their queue files?
What is the current time at the time you noted the file mtimes?  Are the
files open in some delivery agent (use lsof)?

>>> In the mean time other messages are going through just fine.
>> That's normal.
> I wouldn't think so if the maximal_backoff_time has expired many times
> during that period.

Postfix scans the deferred queue periodically.  Your logs should show
evidence of the messages entering the active queue multiple times.
What does your master.cf entry for "qmgr" look like?

>>> Can someone tell me how I might trouble shoot this problem?
>> First post evidence of a problem (logs and configuration details).
>>
> Nothing of note in the logs ever shows up. All the retry timeout config
> stuff is default. I don't set any of that in main.cf.

Post *all* log entries for one of the problem queue files.  Post
the "qmgr" master.cf entry.  Post any timer overrides from main.cf.
Post any warnings or errors logged by "qmgr".

--
        Viktor.

Reply | Threaded
Open this post in threaded view
|

Re: Mail Queue Stall Problem

Dennis Putnam-2
Thanks again. I will do so as soon as the problem recurs.

On 1/18/2018 3:46 PM, Viktor Dukhovni wrote:

>
>> On Jan 18, 2018, at 3:26 PM, Dennis Putnam <[hidden email]> wrote:
>>
>>>> The messages show in mailq as having timed
>>>> out and are a few days old.
>>> They've probably maxed out the maximal_backoff_time and are tried
>>> infrequently, but are not forgotten.
>> The default is 4000 seconds according to the docs. Since I don't
>> specifically set it in main.cf that is a little over an hour. In the 2-3
>> days they have been sitting in the queue there should have been numerous
>> retries. During which time other message were going through. Why did
>> they not time out as well? It is too coincidental that the destination
>> server times out every time just those messages are retried while others
>> do not.
> Are any such messages still present?  What is the mtime of their queue files?
> What is the current time at the time you noted the file mtimes?  Are the
> files open in some delivery agent (use lsof)?
>
>>>> In the mean time other messages are going through just fine.
>>> That's normal.
>> I wouldn't think so if the maximal_backoff_time has expired many times
>> during that period.
> Postfix scans the deferred queue periodically.  Your logs should show
> evidence of the messages entering the active queue multiple times.
> What does your master.cf entry for "qmgr" look like?
>
>>>> Can someone tell me how I might trouble shoot this problem?
>>> First post evidence of a problem (logs and configuration details).
>>>
>> Nothing of note in the logs ever shows up. All the retry timeout config
>> stuff is default. I don't set any of that in main.cf.
> Post *all* log entries for one of the problem queue files.  Post
> the "qmgr" master.cf entry.  Post any timer overrides from main.cf.
> Post any warnings or errors logged by "qmgr".
>


signature.asc (203 bytes) Download Attachment