Watchdog timeout with big virtual list

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Watchdog timeout with big virtual list

Andrea Gelmini-3
Hi all,
   I've googled a lot, but I didn't find a working solution, so I'm
bothering you...
   I've got a simple postfix installation (I mean: no anti-spam/virus,
custom transport, and so on),
   with the only purpose to deliver a mailing list to ~100.000 emails.
   Postfix version is 2.5.1 (2.5.1-2ubuntu1 the one in Ubunty Hardy,
server flavor).
   In main.cf I modified this settings:
virtual_alias_domains = ...other my domain...
virtual_alias_maps = hash:/etc/postfix/virtual
virtual_alias_expansion_limit = 1000000
daemon_timeout = 12h

   And /etc/postfix/virtual is so:
gino@mydomain: [hidden email],
 [hidden email],
 ...
 [hidden email]

   now, here what happens:

I run a test email:
    May 13 13:09:28 [postfix/pickup] 6507044B2: uid=0 from=<root>

I see "cleanup -z -t unix -u -c" process crunching, but after a while:

    May 13 13:25:45 [postfix/pickup] fatal: watchdog timeout
    May 13 13:25:46 [postfix/master] warning: process
/usr/lib/postfix/pickup pid 31566 exit status 1
    May 13 13:25:46 [postfix/pickup] AEBB144AE: uid=0 from=<root>

And now I've got two "cleanup -z -t unix -u -c" working, and after a while:

    May 13 13:42:25 [postfix/pickup] fatal: watchdog timeout
    May 13 13:42:26 [postfix/master] warning: process
/usr/lib/postfix/pickup pid 31633 exit status 1
    May 13 13:42:26 [postfix/master] warning: /usr/lib/postfix/pickup:
bad command startup -- throttling
    May 13 13:43:26 [postfix/pickup] BBF0B44AF: uid=0 from=<root>

So I see three "cleanup -z -t unix -u -c".
Well, I stopped postfix and cleaned the queue, because I was afraid to
flood users mailbox.

Now, If I split  /etc/postfix/virtual with 50.000 addresses a time,
everything works like a charm.
But it's same email, and I wouldn't like waste resources as bandwidth.

Thanks a lot for your time,
Andrea Gelmini
Reply | Threaded
Open this post in threaded view
|

Re: Watchdog timeout with big virtual list

Wietse Venema
>     May 13 13:09:28 [postfix/pickup] 6507044B2: uid=0 from=<root>
>     May 13 13:25:46 [postfix/master] warning: process
> /usr/lib/postfix/pickup pid 31566 exit status 1

That is 978 seconds later.

The default daemon_timeout setting is 5 hours.

Perhaps you have a daemon_timeout setting in master.cf.

If this is a Linux box, a common problem these days is that automatic
updates change kernels and system libraries from underneath Postfix
and then all kind of shit starts happening that can be fixed by
rebooting the new kernel.

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: Watchdog timeout with big virtual list

Andrea Gelmini-3
2008/5/13 Wietse Venema <[hidden email]>:
>  Perhaps you have a daemon_timeout setting in master.cf.

I tried with this:
daemon_timeout = 12h
but nothing changed.

well, the log I pasted was with this setting on like above.

>  If this is a Linux box, a common problem these days is that automatic
>  updates change kernels and system libraries from underneath Postfix
>  and then all kind of shit starts happening that can be fixed by
>  rebooting the new kernel.

Well, I can exclude this, because I made no update after installation,
and no activities at all during tests.

Thanks a lot for your quick answer,
Andrea
Reply | Threaded
Open this post in threaded view
|

Re: Watchdog timeout with big virtual list

Wietse Venema
Andrea Gelmini:
> 2008/5/13 Wietse Venema <[hidden email]>:
> >  Perhaps you have a daemon_timeout setting in master.cf.
>
> I tried with this:
> daemon_timeout = 12h
> but nothing changed.

I WROTE: DAEMON_TIMEOUT SETTING IN MASTER.CF.

> well, the log I pasted was with this setting on like above.
>
> >  If this is a Linux box, a common problem these days is that automatic
> >  updates change kernels and system libraries from underneath Postfix
> >  and then all kind of shit starts happening that can be fixed by
> >  rebooting the new kernel.
>
> Well, I can exclude this, because I made no update after installation,
> and no activities at all during tests.

I WROTE: AUTOMATIC UPDATES.

        Wietse
Reply | Threaded
Open this post in threaded view
|

Watchdog timeout with big virtual list

Andrea Gelmini-3
2008/5/13 Wietse Venema <[hidden email]>:

>  I WROTE: DAEMON_TIMEOUT SETTING IN MASTER.CF.

 no.

 root@mail:/etc/postfix# grep -i DAEMON_TIMEOUT master.cf
 root@mail:/etc/postfix#


 >  I WROTE: AUTOMATIC UPDATES.
 no, nothing of this kind is enabled.



 thanks a lot for your time,
 Andrea
Reply | Threaded
Open this post in threaded view
|

Re: Watchdog timeout with big virtual list

Victor Duchovni
On Tue, May 13, 2008 at 05:25:54PM +0200, Andrea Gelmini wrote:

> 2008/5/13 Wietse Venema <[hidden email]>:
>
> >  I WROTE: DAEMON_TIMEOUT SETTING IN MASTER.CF.
>

The watchdog timer for trigger servers is fixed at 1000.

src/master/trigger_server.c:trigger_server_main():726

    watchdog = watchdog_create(1000, (WATCHDOG_FN) 0, (char *) 0);

It is rather unwise to use virtual expansion for lists this large.
The right mechanism for this is a ":include:/file" local alias.

--
        Viktor.

Disclaimer: off-list followups get on-list replies or get ignored.
Please do not ignore the "Reply-To" header.

To unsubscribe from the postfix-users list, visit
http://www.postfix.org/lists.html or click the link below:
<mailto:[hidden email]?body=unsubscribe%20postfix-users>

If my response solves your problem, the best way to thank me is to not
send an "it worked, thanks" follow-up. If you must respond, please put
"It worked, thanks" in the "Subject" so I can delete these quickly.
Reply | Threaded
Open this post in threaded view
|

Re: Watchdog timeout with big virtual list

Wietse Venema
Victor Duchovni:
> > 2008/5/13 Wietse Venema <[hidden email]>:
> >
> > >  I WROTE: DAEMON_TIMEOUT SETTING IN MASTER.CF.
>
> The watchdog timer for trigger servers is fixed at 1000.
>
> src/master/trigger_server.c:trigger_server_main():726
>
>     watchdog = watchdog_create(1000, (WATCHDOG_FN) 0, (char *) 0);

Then something needs to be fixed. The pickup server is entirely
dependent on how long the cleanup server takes to enqueue an email
message, so having a shorter timeout is unproductive.  

The short-term fix could be extending the libmaster API with another
attribute (MAIL_SERVER_WATCHDOG) that specifies application-specified
timeout value, but the deeper problem is single-threaded local
submission.

> It is rather unwise to use virtual expansion for lists this large.
> The right mechanism for this is a ":include:/file" local alias.

Would this still be the case if the hard-coded limit were fixed?
There is no automatic mechanism to replace the envelope sender,
but what else is missing?

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: Watchdog timeout with big virtual list

Victor Duchovni
On Tue, May 13, 2008 at 01:33:37PM -0400, Wietse Venema wrote:

> > The watchdog timer for trigger servers is fixed at 1000.
> >
> > src/master/trigger_server.c:trigger_server_main():726
> >
> >     watchdog = watchdog_create(1000, (WATCHDOG_FN) 0, (char *) 0);
>
> Then something needs to be fixed. The pickup server is entirely
> dependent on how long the cleanup server takes to enqueue an email
> message, so having a shorter timeout is unproductive.  

Likely so. But having the pickup server frozen for 1000s is not that
cool either.

> The short-term fix could be extending the libmaster API with another
> attribute (MAIL_SERVER_WATCHDOG) that specifies application-specified
> timeout value, but the deeper problem is single-threaded local
> submission.

Which brings us to the above observation. Pickup is simply not designed
for messages that burn 16+ minutes of pre-queue rewriting. Making pickup
multi-threaded (really multiple pickup processes) requires a locking
protocol (ala "sendmail -q").

> > It is rather unwise to use virtual expansion for lists this large.
> > The right mechanism for this is a ":include:/file" local alias.
>
> Would this still be the case if the hard-coded limit were fixed?
> There is no automatic mechanism to replace the envelope sender,
> but what else is missing?

With pickup single-threaded, local submission stalls for a long time
while the jumbo message is processed. Bounce handling really wants an
"owner-alias" for large lists. Virtual expansion generates and splits
very large result sets.

Local alias expansion of ":include:" processes one address at a time,
and may be substantialy more efficient in the limit ( no need to tokenize
multi-megabyte address lists).

--
        Viktor.

Disclaimer: off-list followups get on-list replies or get ignored.
Please do not ignore the "Reply-To" header.

To unsubscribe from the postfix-users list, visit
http://www.postfix.org/lists.html or click the link below:
<mailto:[hidden email]?body=unsubscribe%20postfix-users>

If my response solves your problem, the best way to thank me is to not
send an "it worked, thanks" follow-up. If you must respond, please put
"It worked, thanks" in the "Subject" so I can delete these quickly.