Postfix using all CPU after nightly mail submission

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Postfix using all CPU after nightly mail submission

Zach Sheppard
Hello,

We use a Postfix installation on Debian 9 in our network to serve as a send-only SMTP server for a very large application in our school district. This application does not have its own built-in SMTP server and requires a relay to send notification e-mails to students, teachers, and parents topping out at around 10,000 e-mails per day. Our hosted e-mail solution (G Suite) has a sending cap that we hit with this application so we setup this Postfix installation to serve as an alternative.

However, whenever Postfix gets a large e-mail load it processes the e-mails for around 30-45 minutes and then consistently uses around 70-80% of the CPU effectively locking up the entire system so much so that I can't even login and debug without either disabling the network card in VMWare or rebooting the system totally. The VM has 2 CPU cores and 4GB RAM allocated to it which I thought was plenty for this application.

I see no errors or alarms in the mail log nor syslog. The only odd message I see is a series of ^@ symbols before the system freezes:

Jan 15 00:42:41 mailrelay opendkim[1643]: 385B38094E: DKIM-Signature field added (s=mailrelay, d=oconee.k12.sc.us)
Jan 15 00:42:41 mailrelay postfix/qmgr[5601]: 385B38094E: from=<[hidden email]>, size=2282, nrcpt=1 (queue active)
Jan 15 00:42:41 mailrelay postfix/smtpd[6128]: 8BFCC80968: client=unknown[DDD.DD.DD.DD], sasl_method=LOGIN, sasl_username=[hidden email]
Jan 15 00:42:41 mailrelay postfix/cleanup[6375]: 8BFCC80968: message-id=<[hidden email]>
Jan 15 00:42:41 mailrelay opendkim[1643]: 8BFCC80968: DKIM-Signature field added (s=mailrelay, d=oconee.k12.sc.us)
Jan 15 00:42:41 mailrelay postfix/qmgr[5601]: 8BFCC80968: from=<[hidden email]>, size=2370, nrcpt=1 (queue active)
Jan 15 00:42:41 mailrelay postfix/smtpd[6128]: B66B38096E: client=unknown[DDD.DD.DD.DD], sasl_method=LOGIN, sasl_username=[hidden email]
Jan 15 00:42:41 mailrelay postfix/cleanup[6375]: B66B38096E: message-id=<[hidden email]>
Jan 15 00:42:41 mailrelay opendkim[1643]: B66B38096E: DKIM-Signature field added (s=mailrelay, d=oconee.k12.sc.us)
Jan 15 00:42:41 mailrelay postfix/qmgr[5601]: B66B38096E: from=<[hidden email]>, size=3705, nrcpt=1 (queue active)
Jan 15 00:42:41 mailrelay postfix/smtpd[6128]: E39EA8096F: client=unknown[DDD.DD.DD.DD], sasl_method=LOGIN, sasl_username=[hidden email]
Jan 15 00:42:41 mailrelay postfix/cleanup[6375]: E39EA8096F: message-id=<[hidden email]>
Jan 15 00:42:41 mailrelay opendkim[1643]: E39EA8096F: DKIM-Signature field added (s=mailrelay, d=oconee.k12.sc.us)
Jan 15 00:42:42 mailrelay postfix/qmgr[5601]: E39EA8096F: from=<[hidden email]>, size=2281, nrcpt=1 (queue active)
Jan 15 00:42:42 mailrelay postfix/smtpd[6128]: 25EFC80970: client=unknown[DDD.DD.DD.DD], sasl_method=LOGIN, sasl_username=[hidden email]
Jan 15 00:42:42 mailrelay postfix/cleanup[6375]: 25EFC80970: message-id=<[hidden email]>
Jan 15 00:42:42 mailrelay opendkim[1643]: 25EFC80970: DKIM-Signature field added (s=mailrelay, d=oconee.k12.sc.us)
Jan 15 00:42:42 mailrelay postfix/qmgr[5601]: 25EFC80970: from=<[hidden email]>, size=2486, nrcpt=1 (queue active)
Jan 15 00:42:42 mailrelay postfix/smtpd[6128]: 5329380972: client=unknown[DDD.DD.DD.DD], sasl_method=LOGIN, sasl_username=[hidden email]
Jan 15 00:42:42 mailrelay postfix/cleanup[6375]: 5329380972: message-id=<[hidden email]>
Jan 15 00:42:42 mailrelay opendkim[1643]: 5329380972: DKIM-Signature field added (s=mailrelay, d=oconee.k12.sc.us)
Jan 15 00:42:42 mailrelay postfix/qmgr[5601]: 5329380972: from=<[hidden email]>, size=2326, nrcpt=1 (queue active)
Jan 15 00:42:42 mailrelay postfix/smtpd[6128]: 8EF0980973: client=unknown[DDD.DD.DD.DD], sasl_method=LOGIN, sasl_username=[hidden email]
Jan 15 00:42:42 mailrelay postfix/cleanup[6375]: 8EF0980973: message-id=<[hidden email]>
Jan 15 00:42:42 mailrelay opendkim[1643]: 8EF0980973: DKIM-Signature field added (s=mailrelay, d=oconee.k12.sc.us)
Jan 15 00:42:42 mailrelay postfix/qmgr[5601]: 8EF0980973: from=<[hidden email]>, size=2408, nrcpt=1 (queue active)^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
Jan 15 09:31:40 mailrelay opendkim[668]: OpenDKIM Filter v2.11.0 starting (args: -x /etc/opendkim.conf)
Jan 15 09:31:43 mailrelay postfix/postfix-script[849]: starting the Postfix mail system
Jan 15 09:31:43 mailrelay postfix/master[851]: daemon started -- version 3.1.6, configuration /etc/postfix
Jan 15 09:31:43 mailrelay postfix/qmgr[853]: 2753880968: from=<[hidden email]>, size=5027, nrcpt=1 (queue active)
Jan 15 09:31:43 mailrelay postfix/qmgr[853]: 989CD80966: from=<[hidden email]>, size=2176, nrcpt=2 (queue active)
Jan 15 09:31:43 mailrelay postfix/qmgr[853]: DDEC880970: from=<[hidden email]>, size=2229, nrcpt=1 (queue active)

After a reboot usually Postfix finishes its mails and continues like nothing ever happened.

I am at a loss how to debug or tweak any further. Do I need to implement some type of load-balancing? Is there a way I can debug more during the time of the freeze? I have also tried re-installing Debian with a fresh ISO (I compared MD5 hashes) and fresh install of Postfix/OpenDKIM. (copied config files)

Thanks for your help

This message may contain confidential information and is intended only for the individuals named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.
Reply | Threaded
Open this post in threaded view
|

Re: Postfix using all CPU after nightly mail submission

Bastian Blank-3
On Fri, Jan 19, 2018 at 08:52:39AM -0500, Zach Sheppard wrote:
> I see no errors or alarms in the mail log nor syslog. The only odd message
> I see is a series of ^@ symbols before the system freezes:

Please read http://www.postfix.org/DEBUG_README.html#mail about how to
report problems.

> Jan 15 00:42:41 mailrelay opendkim[1643]: 385B38094E: DKIM-Signature field
> added (s=mailrelay, d=oconee.k12.sc.us)
> Jan 15 00:42:41 mailrelay postfix/qmgr[5601]: 385B38094E: from=<
> [hidden email]>, size=2282, nrcpt=1 (queue active)
> Jan 15 00:42:41 mailrelay postfix/smtpd[6128]: 8BFCC80968:
> client=unknown[DDD.DD.DD.DD], sasl_method=LOGIN, sasl_username=AAAAAAAAAAA@
> oconee.k12.sc.us

You have a broken system, nothing reports "DDD.DD.DD.DD" as ip address.

> This message may contain confidential information and is intended only for
> the individuals named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. If you are not the intended recipient
> you are notified that disclosing, copying, distributing or taking any
> action in reliance on the contents of this information is strictly
> prohibited.

Please remove this statement.  This mailing-list is already public
archived.

Bastian

--
Those who hate and fight must stop themselves -- otherwise it is not stopped.
                -- Spock, "Day of the Dove", stardate unknown
Reply | Threaded
Open this post in threaded view
|

Re: Postfix using all CPU after nightly mail submission

Bret Busby
On Fri, 19 Jan 2018, Bastian Blank wrote:

> Date: Fri, 19 Jan 2018 22:06:15
> From: Bastian Blank <bastian+postfix-users=[hidden email]>
> To: [hidden email]
> Subject: Re: Postfix using all CPU after nightly mail submission
>
> On Fri, Jan 19, 2018 at 08:52:39AM -0500, Zach Sheppard wrote:
>> I see no errors or alarms in the mail log nor syslog. The only odd message
>> I see is a series of ^@ symbols before the system freezes:
>
> Please read http://www.postfix.org/DEBUG_README.html#mail about how to
> report problems.
>
>> Jan 15 00:42:41 mailrelay opendkim[1643]: 385B38094E: DKIM-Signature field
>> added (s=mailrelay, d=oconee.k12.sc.us)
>> Jan 15 00:42:41 mailrelay postfix/qmgr[5601]: 385B38094E: from=<
>> [hidden email]>, size=2282, nrcpt=1 (queue active)
>> Jan 15 00:42:41 mailrelay postfix/smtpd[6128]: 8BFCC80968:
>> client=unknown[DDD.DD.DD.DD], sasl_method=LOGIN, sasl_username=AAAAAAAAAAA@
>> oconee.k12.sc.us
>
> You have a broken system, nothing reports "DDD.DD.DD.DD" as ip address.
>
>> This message may contain confidential information and is intended only for
>> the individuals named. If you are not the named addressee you should not
>> disseminate, distribute or copy this e-mail. Please notify the sender
>> immediately by e-mail if you have received this e-mail by mistake and
>> delete this e-mail from your system. If you are not the intended recipient
>> you are notified that disclosing, copying, distributing or taking any
>> action in reliance on the contents of this information is strictly
>> prohibited.
>
> Please remove this statement.  This mailing-list is already public
> archived.
>
> Bastian
>
> --
> Those who hate and fight must stop themselves -- otherwise it is not stopped.
> -- Spock, "Day of the Dove", stardate unknown
>

I do not know whether the From address characters (non-alphanumeric,
other than the '@' character) are the cause, but the above message is
one of those messages that endlessly loops between filters for malicious
email messages, and is only accidentally found by switching off filters.

--
Bret Busby
Armadale
West Australia
..............

"So once you do know what the question actually is,
  you'll know what the answer means."
- Deep Thought,
   Chapter 28 of Book 1 of
   "The Hitchhiker's Guide to the Galaxy:
   A Trilogy In Four Parts",
   written by Douglas Adams,
   published by Pan Books, 1992
....................................................
Reply | Threaded
Open this post in threaded view
|

Re: Postfix using all CPU after nightly mail submission

Zach Sheppard
In reply to this post by Bastian Blank-3
Hi Bastian,

I did read that debug readme and saw the following:

>> If you can't use a test email address, please anonymize email addresses and host names consistently. Replace each letter by "A", each digit by "D" so that the helpers can still recognize syntactical errors.

...and is why I replaced the e-mail address and host IP addresses. The from address for this particular message was and is the same for all messages as this server only sends mail from one internal application. However, the to addresses are parents, staff, and other people I didn't want to identify.

As for the statement that is tagged to all of our outgoing mail. I apologize for that being included.

I wasn't sure whether the other debug readme recommendations were needed but I have the postfinger output below:

./postfinger
postfinger - postfix configuration on Fri Jan 19 09:50:34 EST 2018
version: 1.30

Warning: postfinger output may show private configuration information,
such as ip addresses and/or domain names which you do not want to show
to the public.  If this is the case it is your responsibility to modify
the output to hide this private information.  [Remove this warning with
the --nowarn option.]

--System Parameters--
mail_version = 3.1.6
hostname = <our_mail_hostname_here>
uname = Linux 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04) x86_64 GNU/Linux

--Packaging information--
looks like this postfix comes from deb package: postfix-3.1.6-0+deb9u1

--main.cf non-default parameters--
alias_maps = hash:/etc/postfix/aliases
append_dot_mydomain = no
biff = no
bounce_notice_recipient = root@localhost
bounce_queue_lifetime = 0
compatibility_level = 2
default_destination_concurrency_limit = 50
error_notice_recipient = root@localhost
inet_protocols = ipv4
in_flow_delay = 2s
internal_mail_filter_classes = bounce, notify
masquerade_domains = $mydomain
milter_default_action = reject
milter_protocol = 2
mydestination = $myhostname, localhost
mydomain = oconee.k12.sc.us
myhostname = <our_mail_hostname_here>.fqdn
mynetworks = localhost, 127.0.0.1, [::1]
myorigin = $mydomain
non_smtpd_milters = inet:localhost:8892
notify_classes = bounce, resource, software
proxy_interfaces = <our_public_ip_here>
readme_directory = no
smtp_connect_timeout = 10s
smtpd_banner = $myhostname ESMTP $mail_name (Debian/GNU)
smtpd_client_restrictions = permit_mynetworks, permit_sasl_authenticated, reject
smtpd_milters = inet:localhost:8892
smtpd_reject_unlisted_recipient = no
smtpd_sasl_auth_enable = yes
smtpd_sasl_authenticated_header = yes
smtpd_sasl_local_domain = oconee.k12.sc.us
smtpd_tls_cert_file = /etc/postfix/ssl/oconee.pem
smtpd_tls_key_file = /etc/postfix/ssl/oconee.key
smtpd_tls_security_level = may
smtpd_tls_session_cache_database = btree:${data_directory}/smtpd_scache
smtpd_use_tls = yes
smtp_generic_maps = hash:/etc/postfix/generic_maps
smtp_helo_timeout = 30s
smtp_tls_session_cache_database = btree:${data_directory}/smtp_scache

--master.cf--
smtp      inet  n       -       y       -       -       smtpd
pickup    unix  n       -       y       60      1       pickup
cleanup   unix  n       -       y       -       0       cleanup
qmgr      unix  n       -       n       300     1       qmgr
tlsmgr    unix  -       -       y       1000?   1       tlsmgr
rewrite   unix  -       -       y       -       -       trivial-rewrite
bounce    unix  -       -       y       -       0       bounce
defer     unix  -       -       y       -       0       bounce
trace     unix  -       -       y       -       0       bounce
verify    unix  -       -       y       -       1       verify
flush     unix  n       -       y       1000?   0       flush
proxymap  unix  -       -       n       -       -       proxymap
proxywrite unix -       -       n       -       1       proxymap
smtp      unix  -       -       y       -       -       smtp
relay     unix  -       -       y       -       -       smtp
showq     unix  n       -       y       -       -       showq
error     unix  -       -       y       -       -       error
retry     unix  -       -       y       -       -       error
discard   unix  -       -       y       -       -       discard
local     unix  -       n       n       -       -       local
virtual   unix  -       n       n       -       -       virtual
lmtp      unix  -       -       y       -       -       lmtp
anvil     unix  -       -       y       -       1       anvil
scache    unix  -       -       y       -       1       scache
maildrop  unix  -       n       n       -       -       pipe
  flags=DRhu user=vmail argv=/usr/bin/maildrop -d ${recipient}
uucp      unix  -       n       n       -       -       pipe
  flags=Fqhu user=uucp argv=uux -r -n -z -a$sender - $nexthop!rmail ($recipient)
ifmail    unix  -       n       n       -       -       pipe
  flags=F user=ftn argv=/usr/lib/ifmail/ifmail -r $nexthop ($recipient)
bsmtp     unix  -       n       n       -       -       pipe
  flags=Fq. user=bsmtp argv=/usr/lib/bsmtp/bsmtp -t$nexthop -f$sender $recipient
scalemail-backend unix  -       n       n       -       2       pipe
  flags=R user=scalemail argv=/usr/lib/scalemail/bin/scalemail-store ${nexthop} ${user} ${extension}
mailman   unix  -       n       n       -       -       pipe
  flags=FR user=list argv=/usr/lib/mailman/bin/postfix-to-mailman.py
  ${nexthop} ${user}

-- end of postfinger output --

On Fri, Jan 19, 2018 at 9:06 AM, Bastian Blank <[hidden email]> wrote:
On Fri, Jan 19, 2018 at 08:52:39AM -0500, Zach Sheppard wrote:
> I see no errors or alarms in the mail log nor syslog. The only odd message
> I see is a series of ^@ symbols before the system freezes:

Please read http://www.postfix.org/DEBUG_README.html#mail about how to
report problems.

> Jan 15 00:42:41 mailrelay opendkim[1643]: 385B38094E: DKIM-Signature field
> added (s=mailrelay, d=oconee.k12.sc.us)
> Jan 15 00:42:41 mailrelay postfix/qmgr[5601]: 385B38094E: from=<
> [hidden email]>, size=2282, nrcpt=1 (queue active)
> Jan 15 00:42:41 mailrelay postfix/smtpd[6128]: 8BFCC80968:
> client=unknown[DDD.DD.DD.DD], sasl_method=LOGIN, sasl_username=AAAAAAAAAAA@
> oconee.k12.sc.us

You have a broken system, nothing reports "DDD.DD.DD.DD" as ip address.

> This message may contain confidential information and is intended only for
> the individuals named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. If you are not the intended recipient
> you are notified that disclosing, copying, distributing or taking any
> action in reliance on the contents of this information is strictly
> prohibited.

Please remove this statement.  This mailing-list is already public
archived.

Bastian

--
Those who hate and fight must stop themselves -- otherwise it is not stopped.
                -- Spock, "Day of the Dove", stardate unknown

Reply | Threaded
Open this post in threaded view
|

Re: Postfix using all CPU after nightly mail submission

Viktor Dukhovni
In reply to this post by Zach Sheppard


> On Jan 19, 2018, at 8:52 AM, Zach Sheppard <[hidden email]> wrote:
>
> Jan 15 00:42:42 mailrelay postfix/qmgr[5601]: 8EF0980973: from=<[hidden email]>, size=2408, nrcpt=1 (queue active)^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
> Jan 15 09:31:40 mailrelay opendkim[668]: OpenDKIM Filter v2.11.0 starting (args: -x /etc/opendkim.conf)

What you see as "^@" is how ASCII NUL (the zero byte) is displayed
by "more", "less", "vi", ...   It appears that the log file has either
lots of NULs written to it, or perhaps has a "hole" as a result of
truncation of the log file while it was still being written by the
syslog daemon.  Perhaps incorrect log rotation...

As for high CPU, that's a bit hard to explain without further
information, which is difficult to obtain with a truncated log.
Postfix does not normally bring systems to their knees, its
resource limits are specifically designed to avoid that.

--
        Viktor.

Reply | Threaded
Open this post in threaded view
|

Re: Postfix using all CPU after nightly mail submission

Viktor Dukhovni
In reply to this post by Zach Sheppard


> On Jan 19, 2018, at 9:55 AM, Zach Sheppard <[hidden email]> wrote:
>
> bounce_queue_lifetime = 0

This is a bad idea.  Don't do that.  Allow transient delivery failures
to be retried.

> default_destination_concurrency_limit = 50

This is the default, remove the setting.

> internal_mail_filter_classes = bounce, notify

Leave this setting at its default value by removing it from main.cf.

> notify_classes = bounce, resource, software

Consider setting this empty, when sending bulk mail, your source of
notices is the mail log, not per-message notices.

> smtp_connect_timeout = 10s

This is may be a bit short, but should be OK unless some destination is
under sustained heavy load.

> smtpd_tls_session_cache_database = btree:${data_directory}/smtpd_scache

You can/should set this empty, and leave session caching to the client via
session tickets.

> smtp_helo_timeout = 30s

This too can be a bit short for overloaded peers.

> smtp_tls_session_cache_database = btree:${data_directory}/smtp_scache

This is your client session cache, correctly configured, DO NOT also
clear this when clearing the smtpd setting.

> --master.cf--
> smtp      inet  n       -       y       -       -       smtpd
> pickup    unix  n       -       y       60      1       pickup
> cleanup   unix  n       -       y       -       0       cleanup
> qmgr      unix  n       -       n       300     1       qmgr
> tlsmgr    unix  -       -       y       1000?   1       tlsmgr
> rewrite   unix  -       -       y       -       -       trivial-rewrite
> bounce    unix  -       -       y       -       0       bounce
> defer     unix  -       -       y       -       0       bounce
> trace     unix  -       -       y       -       0       bounce
> verify    unix  -       -       y       -       1       verify
> flush     unix  n       -       y       1000?   0       flush
> proxymap  unix  -       -       n       -       -       proxymap
> proxywrite unix -       -       n       -       1       proxymap
> smtp      unix  -       -       y       -       -       smtp
> relay     unix  -       -       y       -       -       smtp
> showq     unix  n       -       y       -       -       showq
> error     unix  -       -       y       -       -       error
> retry     unix  -       -       y       -       -       error
> discard   unix  -       -       y       -       -       discard
> local     unix  -       n       n       -       -       local
> virtual   unix  -       n       n       -       -       virtual
> lmtp      unix  -       -       y       -       -       lmtp
> anvil     unix  -       -       y       -       1       anvil
> scache    unix  -       -       y       -       1       scache
> [...]


The master.cf file looks fairly normal, but uses chroot for many
services.  This can make for fragile management of the system when
the chroot jail is subtly misconfigured, but is not necessarily a
problem (when done right).

--
        Viktor.

Reply | Threaded
Open this post in threaded view
|

Re: Postfix using all CPU after nightly mail submission

Dominic Raferd
On 19 January 2018 at 15:21, Viktor Dukhovni <[hidden email]> wrote:
>
>
>
> > default_destination_concurrency_limit = 50
>
> This is the default, remove the setting.
>
> ...


Here's a way to check for explicit settings in main.cf that are
actually defaults and so could be removed (works under bash):

comm -1 -2 <(postconf -n|sort) <(postconf -d|sort)
Reply | Threaded
Open this post in threaded view
|

Re: Postfix using all CPU after nightly mail submission

Viktor Dukhovni


> On Jan 19, 2018, at 10:46 AM, Dominic Raferd <[hidden email]> wrote:
>
> Here's a way to check for explicit settings in main.cf that are
> actually defaults and so could be removed (works under bash):
>
> comm -1 -2 <(postconf -n|sort) <(postconf -d|sort)

The pipes to "sort" should not be needed.  The output of "postconf" is pre-sorted.

--
        Viktor.

Reply | Threaded
Open this post in threaded view
|

Re: Postfix using all CPU after nightly mail submission

Dominic Raferd
On 19 January 2018 at 15:55, Viktor Dukhovni <[hidden email]> wrote:

>
>
>> On Jan 19, 2018, at 10:46 AM, Dominic Raferd <[hidden email]> wrote:
>>
>> Here's a way to check for explicit settings in main.cf that are
>> actually defaults and so could be removed (works under bash):
>>
>> comm -1 -2 <(postconf -n|sort) <(postconf -d|sort)
>
> The pipes to "sort" should not be needed.  The output of "postconf" is pre-sorted.

yes I thought that - but without piping through sort I see:
$ comm -1 -2 <(postconf -n) <(postconf -d)
comm: file 2 is not in sorted order
comm: file 1 is not in sorted order
Reply | Threaded
Open this post in threaded view
|

Re: Postfix using all CPU after nightly mail submission

Viktor Dukhovni


> On Jan 19, 2018, at 10:58 AM, Dominic Raferd <[hidden email]> wrote:
>
>> The pipes to "sort" should not be needed.  The output of "postconf" is pre-sorted.
>
> yes I thought that - but without piping through sort I see:
> $ comm -1 -2 <(postconf -n) <(postconf -d)
> comm: file 2 is not in sorted order
> comm: file 1 is not in sorted order

Perhaps your locale sorts "_" the output differently than the C locale.
Does the issue go away with:

        export LC_COLLATE=C

--
        Viktor.

Reply | Threaded
Open this post in threaded view
|

Re: Postfix using all CPU after nightly mail submission

Dominic Raferd
On 19 January 2018 at 16:02, Viktor Dukhovni <[hidden email]> wrote:

>
>
>> On Jan 19, 2018, at 10:58 AM, Dominic Raferd <[hidden email]> wrote:
>>
>>> The pipes to "sort" should not be needed.  The output of "postconf" is pre-sorted.
>>
>> yes I thought that - but without piping through sort I see:
>> $ comm -1 -2 <(postconf -n) <(postconf -d)
>> comm: file 2 is not in sorted order
>> comm: file 1 is not in sorted order
>
> Perhaps your locale sorts "_" the output differently than the C locale.
> Does the issue go away with:
>
>         export LC_COLLATE=C

Ah yes it does. On my system LC_COLLATE is normally undefined. This
avoids the unwanted messages:

LC_COLLATE=C comm -1 -2 <(postconf -n) <(postconf -d)
Reply | Threaded
Open this post in threaded view
|

Re: Postfix using all CPU after nightly mail submission

Wietse Venema
In reply to this post by Zach Sheppard
Zach Sheppard:
> However, whenever Postfix gets a large e-mail load it processes the e-mails
> for around 30-45 minutes and then consistently uses around 70-80% of the
> CPU effectively locking up the entire system so much so that I can't even
> login and debug without either disabling the network card in VMWare or
> rebooting the system totally. The VM has 2 CPU cores and 4GB RAM allocated
> to it which I thought was plenty for this application.

Did you determine out WHAT PROGRAM is consuming CPU? As Postfix is
totally I/O limited, I suspect that there may be some other system
component that is messing up.

For example, it is well known that one badly configured syslog
service can use up lots more CPU than all of Postfix combined.

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: Postfix using all CPU after nightly mail submission

Zach Sheppard
Wietse:

I have not made any changes to rsyslog.conf. All it does it redirect all mail log messages to one log in /var/log/mail which I rotate with a cron script nightly. However, I do agree that it really could be the only other process that could be hanging the server.

I'm not able to determine what program is consuming the CPU because I can't login to the console when this occurs. The only way I can recover the machine is by forcibly powering off.

#  /etc/rsyslog.conf    Configuration file for rsyslog.
#
#                       For more information see
#                       /usr/share/doc/rsyslog-doc/html/rsyslog_conf.html


#################
#### MODULES ####
#################

module(load="imuxsock") # provides support for local system logging
module(load="imklog")   # provides kernel logging support
#module(load="immark")  # provides --MARK-- message capability

# provides UDP syslog reception
#module(load="imudp")
#input(type="imudp" port="514")

# provides TCP syslog reception
#module(load="imtcp")
#input(type="imtcp" port="514")


###########################
#### GLOBAL DIRECTIVES ####
###########################

#
# Use traditional timestamp format.
# To enable high precision timestamps, comment out the following line.
#
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat

#
# Set the default permissions for all log files.
#
$FileOwner root
$FileGroup adm
$FileCreateMode 0640
$DirCreateMode 0755
$Umask 0022

#
# Where to place spool and state files
#
$WorkDirectory /var/spool/rsyslog

#
# Include all config files in /etc/rsyslog.d/
#
$IncludeConfig /etc/rsyslog.d/*.conf


###############
#### RULES ####
###############

#
# First some standard log files.  Log by facility.
#
auth,authpriv.*                 /var/log/auth.log
*.*;auth,authpriv.none          -/var/log/syslog
#cron.*                         /var/log/cron.log
daemon.*                        -/var/log/daemon.log
kern.*                          -/var/log/kern.log
lpr.*                           -/var/log/lpr.log
mail.*                          -/var/log/mail/mail.log
user.*                          -/var/log/user.log

#
# Logging for the mail system.  Split it up so that
# it is easy to write scripts to parse these files.
#
#mail.info                      -/var/log/mail.info
#mail.warn                      -/var/log/mail.warn
#mail.err                       /var/log/mail.err

#
# Some "catch-all" log files.
#
*.=debug;\
        auth,authpriv.none;\
        news.none;mail.none     -/var/log/debug
*.=info;*.=notice;*.=warn;\
        auth,authpriv.none;\
        cron,daemon.none;\
        mail,news.none          -/var/log/messages

#
# Emergencies are sent to everybody logged in.
#
*.emerg                         :omusrmsg:*

On Fri, Jan 19, 2018 at 1:20 PM, Wietse Venema <[hidden email]> wrote:
Zach Sheppard:
> However, whenever Postfix gets a large e-mail load it processes the e-mails
> for around 30-45 minutes and then consistently uses around 70-80% of the
> CPU effectively locking up the entire system so much so that I can't even
> login and debug without either disabling the network card in VMWare or
> rebooting the system totally. The VM has 2 CPU cores and 4GB RAM allocated
> to it which I thought was plenty for this application.

Did you determine out WHAT PROGRAM is consuming CPU? As Postfix is
totally I/O limited, I suspect that there may be some other system
component that is messing up.

For example, it is well known that one badly configured syslog
service can use up lots more CPU than all of Postfix combined.

        Wietse


This message may contain confidential information and is intended only for the individuals named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.
Reply | Threaded
Open this post in threaded view
|

Re: Postfix using all CPU after nightly mail submission

Viktor Dukhovni
On Fri, Jan 19, 2018 at 02:04:05PM -0500, Zach Sheppard wrote:

> I have not made any changes to rsyslog.conf. All it does it redirect all
> mail log messages to one log in /var/log/mail which I rotate with a cron
> script nightly. However, I do agree that it really could be the only other
> process that could be hanging the server.

Your milter could be another culprit, and you've unwisely configured
filtering of internally generated mail, including notices about
problems.  So the first thing is to undo the filtering of internally
generated mail.  See also my other comments about your config.

As for syslog, for MTAs I prefer syslog-ng, to rsyslog, and make
sure that /dev/log is configured as a "dgram" not "stream" socket.
Also make sure that log writes are not synchronous.

--
        Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: Postfix using all CPU after nightly mail submission

Wietse Venema
In reply to this post by Zach Sheppard
Zach Sheppard:

> Wietse:
>
> I have not made any changes to rsyslog.conf. All it does it redirect all
> mail log messages to one log in /var/log/mail which I rotate with a cron
> script nightly. However, I do agree that it really could be the only other
> process that could be hanging the server.
>
> I'm not able to determine what program is consuming the CPU because I can't
> login to the console when this occurs. The only way I can recover the
> machine is by forcibly powering off.

I suspect that heavy I/O from Postfix and syslog is too much for
your VM.

To diagnose the problem, run screen(1) on a stable machine, and
then open a login session into the VM while it still responsive.
Then come back to that screen session when things go bad. You're
likely to find that when the VM is very slow, all time is spent in
the VM's kernel, and the host's VMM.

Note that VMs, while fine for CPU-bound jobs, can introduce serious
CPU overhead for things that do massive amounts of I/O like Postfix
plus syslog.

If you can't get a better VM, you can reduce the impact from a
'large' mailing by reducing the number of concurrent Postfix SMTP
server and client processes.

# postconf default_process_limit=10
# postfix reload

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: Postfix using all CPU after nightly mail submission

Ralf Hildebrandt-2
In reply to this post by Viktor Dukhovni
> > Jan 15 00:42:42 mailrelay postfix/qmgr[5601]: 8EF0980973: from=<[hidden email]>, size=2408, nrcpt=1 (queue active)^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
> > Jan 15 09:31:40 mailrelay opendkim[668]: OpenDKIM Filter v2.11.0 starting (args: -x /etc/opendkim.conf)
>
> What you see as "^@" is how ASCII NUL (the zero byte) is displayed
> by "more", "less", "vi", ...   It appears that the log file has either
> lots of NULs written to it, or perhaps has a "hole" as a result of
> truncation of the log file while it was still being written by the
> syslog daemon.  Perhaps incorrect log rotation...
>
> As for high CPU, that's a bit hard to explain without further
> information, which is difficult to obtain with a truncated log.
> Postfix does not normally bring systems to their knees, its
> resource limits are specifically designed to avoid that.

I've seen similar issues. Machines executing one job - which either
goes hogging the CPU or using all memory, then either the OOM killer
goes mad or the whole machine panics, leaving the filesystem in an
inconsistent state. Often leaving zeros in the log.

--
[*] sys4 AG

https://sys4.de, +49 (89) 30 90 46 64
Schleißheimer Straße 26/MG, 80333 München
                                           
Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein
Reply | Threaded
Open this post in threaded view
|

Re: Postfix using all CPU after nightly mail submission

Zach Sheppard
In reply to this post by Viktor Dukhovni
Hi Viktor:

I configured that internally generated mail to be filtered through our DKIM milter. This was to allow deliveries of error messages from the server to my inbox. However, I see above how this is not ideal since that can just be delivered to the root user inbox. I have turned off that setting.

After making several tweaks from suggestions in this thread I saw much better performance last night.

On Fri, Jan 19, 2018 at 2:25 PM, Viktor Dukhovni <[hidden email]> wrote:
On Fri, Jan 19, 2018 at 02:04:05PM -0500, Zach Sheppard wrote:

> I have not made any changes to rsyslog.conf. All it does it redirect all
> mail log messages to one log in /var/log/mail which I rotate with a cron
> script nightly. However, I do agree that it really could be the only other
> process that could be hanging the server.

Your milter could be another culprit, and you've unwisely configured
filtering of internally generated mail, including notices about
problems.  So the first thing is to undo the filtering of internally
generated mail.  See also my other comments about your config.

As for syslog, for MTAs I prefer syslog-ng, to rsyslog, and make
sure that /dev/log is configured as a "dgram" not "stream" socket.
Also make sure that log writes are not synchronous.

--
        Viktor.


This message may contain confidential information and is intended only for the individuals named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.