Slowness after upgrading from postfix 2.x to 3.1.8

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Slowness after upgrading from postfix 2.x to 3.1.8

Christopher R. Gabriel
Hi,

after upgrading to Debian 9 (thus Postfix 3.1.8) I'm experiecing an odd
behaviour, which causes slowness on all the infrastructure.

I have a generator server which injects (via smtp) into postfix, the
actual sender, and when burst of delivery happens, the receiving
postfix stuck before answering to the generator, which causes the
generator queues to fill up.

I've logged the smtp session under load from the postfix 3.1.8 server,
and in the following log excerpt you can note a "pause" of ~26 seconds:

Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type N
len 9 data     </td>
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type N
len 7 data   </tr>
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type N
len 36 data </table></
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type N
len 279 data <tr><td al
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type N
len 503 data You have r
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type N
len 18 data </td></tr>
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type N
len 67 data <span styl
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type N
len 7 data </html>
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type N
len 0 data
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type N
len 55 data --____=-WW
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type N
len 0 data
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type X
len 0 data
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type E
len 0 data
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]:
vstream_fflush_some: fd 18 flush 2433
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]:
vstream_buf_get_ready: fd 18 got 18
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: public/cleanup
socket: wanted attribute: status
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: input attribute
name: status
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: input attribute
value: 0
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: public/cleanup
socket: wanted attribute: reason
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: input attribute
name: reason
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: input attribute
value: (end)
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: public/cleanup
socket: wanted attribute: (list terminator)
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: input attribute
name: (end)
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: >
gen01.localdomain[192.168.25.135]: 250 2.0.0 Ok: queued as
ECA962EA0B336
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]:
vstream_fflush_some: fd 10 flush 39
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: abort all milters
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: milter8_abort:
abort milter inet:127.0.0.1:12301
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: watchdog_pat:
0x561d4886f850
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]:
vstream_buf_get_ready: fd 10 got 6
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: <
gen01.localdomain[192.168.25.135]: QUIT
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: >
gen01.localdomain[192.168.25.135]: 221 2.0.0 Bye
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: match_hostname:
smtpd_client_event_limit_exceptions: gen01.localdomain ~? 127.0.0.0/8

I can provide the full session log.

postfix01 data/spool are on tmpfs.

Do you have any suggestions? This is unfortunately stopping the upgrade
to postfix 3.x for the whole infrastructure.

Thank for any help you can provide!

Regards,

Christopher


alias_database = hash:/etc/aliases
alias_maps = hash:/etc/aliases
append_dot_mydomain = no
biff = no
bounce_queue_lifetime = 6h
compatibility_level = 2
config_directory = /etc/postfix
connection_cache_ttl_limit = 16s
data_directory = /var/spool/postfix/data
default_destination_concurrency_limit = 200
default_destination_rate_delay = 0s
default_destination_recipient_limit = 50
default_recipient_limit = 40
default_transport = defer
disable_mime_output_conversion = yes
header_checks =  regexp:/etc/postfix/header_checks
in_flow_delay = 1
inet_interfaces = 192.168.25.131, 127.0.0.1
inet_protocols = ipv4
mailbox_command = procmail -a "$EXTENSION"
mailbox_size_limit = 0
maximal_backoff_time = 4000s
maximal_queue_lifetime = 3h
milter_default_action = accept
milter_protocol = 6
minimal_backoff_time = 300s
mydestination =
myhostname = postfix01.localdomain
mynetworks = $config_directory/mynetworks
myorigin = /etc/mailname
non_smtpd_milters = $smtpd_milters
qmgr_message_active_limit = 80000
qmgr_message_recipient_limit = 80000
queue_directory = /var/spool/postfix
readme_directory = no
recipient_delimiter = +
relayhost =
smtp_connect_timeout = 3
smtp_connection_cache_on_demand = no
smtp_connection_cache_time_limit = 16s
smtp_connection_reuse_time_limit = 16s
smtp_discard_ehlo_keyword_address_maps = hash:/etc/postfix/busted-
servers
smtp_helo_timeout = 30
smtp_tls_CAfile = /etc/ssl/certs/ca-certificates.crt
smtp_tls_cert_file = /etc/ssl/certs/ssl-cert-snakeoil.pem
smtp_tls_enforce_peername = no
smtp_tls_key_file = /etc/ssl/private/ssl-cert-snakeoil.key
smtp_tls_loglevel = 0
smtp_tls_mandatory_ciphers = low
smtp_tls_security_level = may
smtp_tls_session_cache_timeout = 0
smtpd_banner = $myhostname ESMTP $mail_name (Debian/GNU)
smtpd_client_connection_count_limit = 8000
smtpd_milters = inet:127.0.0.1:12301
smtpd_tls_cert_file = /etc/ssl/certs/ssl-cert-snakeoil.pem
smtpd_tls_key_file = /etc/ssl/private/ssl-cert-snakeoil.key
smtpd_tls_session_cache_database = btree:${data_directory}/smtpd_scache
smtpd_use_tls = no
tls_random_source = dev:/dev/urandom
transport_maps = regexp:/etc/postfix/transport tcp:[127.0.0.1]:23478



--
Christopher R. Gabriel - Senior System Architect

Keyman Technologies Srl - Business Center
Via Conca del Naviglio 18, 20123 Milan - Italy
Phone +39 0287343803
Mobile +39 391 3459671


This message is for the designated recipient only and may contain
privileged or confidential information.
If you have received it in error, please notify the sender immediately
and delete the original.
Any other use of the email by you is prohibited.

Reply | Threaded
Open this post in threaded view
|

Re: Slowness after upgrading from postfix 2.x to 3.1.8

Viktor Dukhovni
> On Jan 4, 2019, at 9:23 AM, Christopher R. Gabriel <[hidden email]> wrote:
>
> Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type E
> len 0 data
> Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]:
> vstream_fflush_some: fd 18 flush 2433
> Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]:
> vstream_buf_get_ready: fd 18 got 18
> Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: public/cleanup
> socket: wanted attribute: status

Your "cleanup" service is responding slowly.  Perhaps your filesystem
is slow, since cleanup has to commit the message to disk.  Or some
tables you're using in cleanup are slow.  The verbose logging does
not help, it puts pressure on the filesystem...

--
        Viktor.

Reply | Threaded
Open this post in threaded view
|

Re: Slowness after upgrading from postfix 2.x to 3.1.8

Christopher R. Gabriel
On Fri, 2019-01-04 at 09:49 -0500, Viktor Dukhovni wrote:

> > On Jan 4, 2019, at 9:23 AM, Christopher R. Gabriel <
> > [hidden email]> wrote:
> >
> > Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type
> > E
> > len 0 data
> > Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]:
> > vstream_fflush_some: fd 18 flush 2433
> > Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]:
> > vstream_buf_get_ready: fd 18 got 18
> > Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: public/cleanup
> > socket: wanted attribute: status
>
> Your "cleanup" service is responding slowly.  Perhaps your filesystem
> is slow, since cleanup has to commit the message to disk.

All spool/data dirs are on ramdisk tmpfs, that's why I'm wondering.

>  Or some tables you're using in cleanup are slow.

I only have a header_checks table with 1 single rule to log a specific
header, and a transport map redis-based. Exactly the same configuration
I have on postfix 2.x.

>  The verbose logging does not help, it puts pressure on the
> filesystem...

I've enable the verbose logging only for debug pourposes, I find the
same behaviour even with logging disabled.

Reply | Threaded
Open this post in threaded view
|

Re: Slowness after upgrading from postfix 2.x to 3.1.8

Viktor Dukhovni


> On Jan 4, 2019, at 10:04 AM, Christopher R. Gabriel <[hidden email]> wrote:
>
>> Or some tables you're using in cleanup are slow.
>
> I only have a header_checks table with 1 single rule to log a specific
> header, and a transport map redis-based. Exactly the same configuration
> I have on postfix 2.x.

Cleanup does perform transport lookups.  Is redis performing well?

>> The verbose logging does not help, it puts pressure on the
>> filesystem...
>
> I've enable the verbose logging only for debug pourposes, I find the
> same behaviour even with logging disabled.

Well, since cleanup is slow, move the verbose logging to cleanup if
you can't determine why cleanup is blocked more directly.

Cleanup performs transport lookups, canonical and virtual rewriting,
header/body checks, writes and syncs the message to disk, and
sends a 1-byte notice to the queue manager socket.  With inflow_delay
it also waits that long to receive a token from the queue manager.

One of these things is slow.

--
        Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: Slowness after upgrading from postfix 2.x to 3.1.8

Christopher R. Gabriel
On Fri, 2019-01-04 at 10:26 -0500, Viktor Dukhovni wrote:

> > On Jan 4, 2019, at 10:04 AM, Christopher R. Gabriel <
> > [hidden email]> wrote:
> >
> > > Or some tables you're using in cleanup are slow.
> >
> > I only have a header_checks table with 1 single rule to log a
> > specific
> > header, and a transport map redis-based. Exactly the same
> > configuration
> > I have on postfix 2.x.
>
> Cleanup does perform transport lookups.  Is redis performing well?

Yes, blazing fast. It's shown in the full log of the smtp session, al
the whole transaction (until the reported slowness event) is immediate.

> > > The verbose logging does not help, it puts pressure on the
> > > filesystem...
> >
> > I've enable the verbose logging only for debug pourposes, I find
> > the
> > same behaviour even with logging disabled.
>
> Well, since cleanup is slow, move the verbose logging to cleanup if
> you can't determine why cleanup is blocked more directly.

I'll try this too.

> Cleanup performs transport lookups, canonical and virtual rewriting,
> header/body checks, writes and syncs the message to disk, and
> sends a 1-byte notice to the queue manager socket.  With inflow_delay
> it also waits that long to receive a token from the queue manager.

in_flow_delay is default, like in the other postfix 2.x instances, all
data (both new and old server) is on ramdisk.. I'll dig into the
verbose log for cleanup, hoping to find the problem

Reply | Threaded
Open this post in threaded view
|

Re: Slowness after upgrading from postfix 2.x to 3.1.8

Matus UHLAR - fantomas
In reply to this post by Christopher R. Gabriel
On 04.01.19 15:23, Christopher R. Gabriel wrote:
>I have a generator server which injects (via smtp) into postfix, the
>actual sender, and when burst of delivery happens, the receiving
>postfix stuck before answering to the generator, which causes the
>generator queues to fill up.

>Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: abort all milters
>Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: milter8_abort:
>abort milter inet:127.0.0.1:12301


>postfix01 data/spool are on tmpfs.

are you OK with losing mail when something breaks?

>smtpd_milters = inet:127.0.0.1:12301

is this milter running properly?

--
Matus UHLAR - fantomas, [hidden email] ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Spam is for losers who can't get business any other way.
Reply | Threaded
Open this post in threaded view
|

Re: Slowness after upgrading from postfix 2.x to 3.1.8

Benny Pedersen-2
In reply to this post by Christopher R. Gabriel
Christopher R. Gabriel skrev den 2019-01-04 15:23:

> postfix01 data/spool are on tmpfs.

its unsafe to use tmpfs for spool dirs in postfix, tmpfs is okay only
for content-filters, not in generic postfix
Reply | Threaded
Open this post in threaded view
|

Re: Slowness after upgrading from postfix 2.x to 3.1.8

Christopher R. Gabriel
In reply to this post by Matus UHLAR - fantomas
On Fri, 2019-01-04 at 19:56 +0100, Matus UHLAR - fantomas wrote:

> On 04.01.19 15:23, Christopher R. Gabriel wrote:
> > I have a generator server which injects (via smtp) into postfix,
> > the
> > actual sender, and when burst of delivery happens, the receiving
> > postfix stuck before answering to the generator, which causes the
> > generator queues to fill up.
> > Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: abort all
> > milters
> > Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: milter8_abort:
> > abort milter inet:127.0.0.1:12301
> > postfix01 data/spool are on tmpfs.
>
> are you OK with losing mail when something breaks?

Yes

> > smtpd_milters = inet:127.0.0.1:12301
>
> is this milter running properly?

That's opendkim. No error or strange behaviour reported.


Regards,

Christopher


Reply | Threaded
Open this post in threaded view
|

Re: Slowness after upgrading from postfix 2.x to 3.1.8

Christopher R. Gabriel
On Mon, 2019-01-14 at 12:42 +0100, Christopher R. Gabriel wrote:

> On Fri, 2019-01-04 at 19:56 +0100, Matus UHLAR - fantomas wrote:
> > On 04.01.19 15:23, Christopher R. Gabriel wrote:
> > > I have a generator server which injects (via smtp) into postfix,
> > > the
> > > actual sender, and when burst of delivery happens, the receiving
> > > postfix stuck before answering to the generator, which causes the
> > > generator queues to fill up.
> > > Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: abort all
> > > milters
> > > Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]:
> > > milter8_abort:
> > > abort milter inet:127.0.0.1:12301
> > > postfix01 data/spool are on tmpfs.
> >
> > are you OK with losing mail when something breaks?
>
> Yes
>
> > > smtpd_milters = inet:127.0.0.1:12301
> >
> > is this milter running properly?
>
> That's opendkim. No error or strange behaviour reported.

After more investigation, the problem is opendkim. No errors logged by
it, but when the milter is enabled, during peaks the delay goes from
0.05 up to 25/30.

The project seems a bit abandonware (no answers to bugs in years,
repository almost stuck), and also recently orphaned by debian
maintainer.

Does anybody have some hint to check for this, or maybe a more
maintained alternative to it?

Thank you!

Christopher


Reply | Threaded
Open this post in threaded view
|

Re: Slowness after upgrading from postfix 2.x to 3.1.8

angelo
Hi Christopher,
I'm on the opendkim list also and it does get little attention.

Is the "delay" recorded in a typical Postfix log entry ?
Stolen from Postfix 2.3.19:
Postfix logs additional delay information as "delays=a/b/c/d"
  where a=time before queue manager, including message transmission;
  b=time in queue manager; c=connection setup time including DNS,
  HELO and TLS; d=message transmission time.


These seem to be the only settings to bump up logging with opendkim:

##  Log activity to the system log.
Syslog  yes

##  Log additional entries indicating successful signing or verification of
messages.
SyslogSuccess   yes

##  If logging is enabled, include detailed logging about why or why not a
message was
##  signed or verified. This causes an increase in the amount of log data
generated
##  for each message, so set this to No (or comment it out) if it gets too
noisy.
LogWhy  yes



If your version supports it you may want to add this to your opendkim config
file ?
Or check "man opendkim.conf" for more options ?

 KeepTemporaryFiles (boolean)
              Instructs the filter to create temporary files containing the
header and body canonicalizations of messages that are signed  or  verified.  
The  location of these files can be set using the TemporaryDirectory
parameter.  Intended only for debugging verification problems.




--
Sent from: http://postfix.1071664.n5.nabble.com/Postfix-Users-f2.html
Reply | Threaded
Open this post in threaded view
|

Re: Slowness after upgrading from postfix 2.x to 3.1.8

Christopher R. Gabriel
On Fri, 2019-02-15 at 09:35 -0700, angelo wrote:
> Hi Christopher,
> I'm on the opendkim list also and it does get little attention.

Really? :)

> Is the "delay" recorded in a typical Postfix log entry ?
> Stolen from Postfix 2.3.19:
> Postfix logs additional delay information as "delays=a/b/c/d"
>   where a=time before queue manager, including message transmission;
>   b=time in queue manager; c=connection setup time including DNS,
>   HELO and TLS; d=message transmission time.

yes, that's from where I get such data from, like

delay=31, delays=30/0/0.33/1.1,

And it's consistent with the problem reported (if you read the thread
backwards)

> These seem to be the only settings to bump up logging with opendkim:

[..]

> If your version supports it you may want to add this to your opendkim
> config
> file ?
> Or check "man opendkim.conf" for more options ?

I've already made it extremely verbose, including setting MilterDebug,
but it just report the success of the operation, nothing else.

>  KeepTemporaryFiles (boolean)
>               Instructs the filter to create temporary files
> containing the
> header and body canonicalizations of messages that are
> signed  or  verified.  
> The  location of these files can be set using the TemporaryDirectory
> parameter.  Intended only for debugging verification problems.

Signatures are applied correctly, it just take too much time to do it.

Regards,

Christopher

Reply | Threaded
Open this post in threaded view
|

Re: Slowness after upgrading from postfix 2.x to 3.1.8

Wietse Venema
Christopher R. Gabriel:

> On Fri, 2019-02-15 at 09:35 -0700, angelo wrote:
> > Hi Christopher,
> > I'm on the opendkim list also and it does get little attention.
>
> Really? :)
>
> > Is the "delay" recorded in a typical Postfix log entry ?
> > Stolen from Postfix 2.3.19:
> > Postfix logs additional delay information as "delays=a/b/c/d"
> >   where a=time before queue manager, including message transmission;
> >   b=time in queue manager; c=connection setup time including DNS,
> >   HELO and TLS; d=message transmission time.
>
> yes, that's from where I get such data from, like
>
> delay=31, delays=30/0/0.33/1.1,

Can you strace the opendkim process? Is it hanging in DNS lookups?

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: Slowness after upgrading from postfix 2.x to 3.1.8

Scott Kitterman-4
In reply to this post by Christopher R. Gabriel
On Friday, February 15, 2019 05:01:45 PM Christopher R. Gabriel wrote:
...
> The project seems a bit abandonware (no answers to bugs in years,
> repository almost stuck), and also recently orphaned by debian
> maintainer.
...

FYI, that was me.  I orphaned it because I'm not using it anymore.  As far as
I know, it still works fine.  I did get tired of the slow pace of development
(I was interested in implementing Ed25519 DKIM signatures), so I wrote dkimpy-
milter and use it now instead.  It is not nearly as featurefull as OpenDKIM,
but it does the things I need.

Scott K
Reply | Threaded
Open this post in threaded view
|

Re: Slowness after upgrading from postfix 2.x to 3.1.8

Christopher R. Gabriel
In reply to this post by Wietse Venema


On Fri, 15 Feb 2019, 18:28 Wietse Venema <[hidden email] wrote:
Christopher R. Gabriel:
> On Fri, 2019-02-15 at 09:35 -0700, angelo wrote:
> > Hi Christopher,
> > I'm on the opendkim list also and it does get little attention.
>
> Really? :)
>
> > Is the "delay" recorded in a typical Postfix log entry ?
> > Stolen from Postfix 2.3.19:
> > Postfix logs additional delay information as "delays=a/b/c/d"
> >   where a=time before queue manager, including message transmission;
> >   b=time in queue manager; c=connection setup time including DNS,
> >   HELO and TLS; d=message transmission time.
>
> yes, that's from where I get such data from, like
>
> delay=31, delays=30/0/0.33/1.1,

Can you strace the opendkim process? Is it hanging in DNS lookups?

        Wietse

It's in signing mode only, and we I try to strace it (with follow option for thread) it crashes.

Christopher
Reply | Threaded
Open this post in threaded view
|

Re: Slowness after upgrading from postfix 2.x to 3.1.8

Christopher R. Gabriel
On Fri, 2019-02-15 at 20:42 +0100, Christopher R. Gabriel wrote:

> > > > Is the "delay" recorded in a typical Postfix log entry ?
> > > > Stolen from Postfix 2.3.19:
> > > > Postfix logs additional delay information as "delays=a/b/c/d"
> > > >   where a=time before queue manager, including message
> > transmission;
> > > >   b=time in queue manager; c=connection setup time including
> > DNS,
> > > >   HELO and TLS; d=message transmission time.
> > >
> > > yes, that's from where I get such data from, like
> > >
> > > delay=31, delays=30/0/0.33/1.1,
> >
> > Can you strace the opendkim process? Is it hanging in DNS lookups?
> >
> >         Wietse
>
> It's in signing mode only, and we I try to strace it (with follow
> option for thread) it crashes.

After being able to spend sometime doing perf and strace on the
opendkim process, I was able to find out that the underlying NSS stack
configuration (nscd + nslcd) was causing the bottleneck. After removing
the cache and the LDAP backend (which is a default, but fortunately not
mandatory for us), delays got aligned with servers using the old stack
(postfix 2.x etc).

So they're really to blame, by now I was able to remove the cause of
the symptom (semi cit.)

Is anyone running the same nss stack without problems?


Christopher