Slowness after upgrading from postfix 2.x to 3.1.8

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Slowness after upgrading from postfix 2.x to 3.1.8

Christopher R. Gabriel
Hi,

after upgrading to Debian 9 (thus Postfix 3.1.8) I'm experiecing an odd
behaviour, which causes slowness on all the infrastructure.

I have a generator server which injects (via smtp) into postfix, the
actual sender, and when burst of delivery happens, the receiving
postfix stuck before answering to the generator, which causes the
generator queues to fill up.

I've logged the smtp session under load from the postfix 3.1.8 server,
and in the following log excerpt you can note a "pause" of ~26 seconds:

Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type N
len 9 data     </td>
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type N
len 7 data   </tr>
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type N
len 36 data </table></
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type N
len 279 data <tr><td al
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type N
len 503 data You have r
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type N
len 18 data </td></tr>
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type N
len 67 data <span styl
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type N
len 7 data </html>
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type N
len 0 data
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type N
len 55 data --____=-WW
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type N
len 0 data
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type X
len 0 data
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type E
len 0 data
Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]:
vstream_fflush_some: fd 18 flush 2433
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]:
vstream_buf_get_ready: fd 18 got 18
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: public/cleanup
socket: wanted attribute: status
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: input attribute
name: status
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: input attribute
value: 0
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: public/cleanup
socket: wanted attribute: reason
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: input attribute
name: reason
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: input attribute
value: (end)
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: public/cleanup
socket: wanted attribute: (list terminator)
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: input attribute
name: (end)
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: >
gen01.localdomain[192.168.25.135]: 250 2.0.0 Ok: queued as
ECA962EA0B336
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]:
vstream_fflush_some: fd 10 flush 39
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: abort all milters
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: milter8_abort:
abort milter inet:127.0.0.1:12301
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: watchdog_pat:
0x561d4886f850
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]:
vstream_buf_get_ready: fd 10 got 6
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: <
gen01.localdomain[192.168.25.135]: QUIT
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: >
gen01.localdomain[192.168.25.135]: 221 2.0.0 Bye
Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: match_hostname:
smtpd_client_event_limit_exceptions: gen01.localdomain ~? 127.0.0.0/8

I can provide the full session log.

postfix01 data/spool are on tmpfs.

Do you have any suggestions? This is unfortunately stopping the upgrade
to postfix 3.x for the whole infrastructure.

Thank for any help you can provide!

Regards,

Christopher


alias_database = hash:/etc/aliases
alias_maps = hash:/etc/aliases
append_dot_mydomain = no
biff = no
bounce_queue_lifetime = 6h
compatibility_level = 2
config_directory = /etc/postfix
connection_cache_ttl_limit = 16s
data_directory = /var/spool/postfix/data
default_destination_concurrency_limit = 200
default_destination_rate_delay = 0s
default_destination_recipient_limit = 50
default_recipient_limit = 40
default_transport = defer
disable_mime_output_conversion = yes
header_checks =  regexp:/etc/postfix/header_checks
in_flow_delay = 1
inet_interfaces = 192.168.25.131, 127.0.0.1
inet_protocols = ipv4
mailbox_command = procmail -a "$EXTENSION"
mailbox_size_limit = 0
maximal_backoff_time = 4000s
maximal_queue_lifetime = 3h
milter_default_action = accept
milter_protocol = 6
minimal_backoff_time = 300s
mydestination =
myhostname = postfix01.localdomain
mynetworks = $config_directory/mynetworks
myorigin = /etc/mailname
non_smtpd_milters = $smtpd_milters
qmgr_message_active_limit = 80000
qmgr_message_recipient_limit = 80000
queue_directory = /var/spool/postfix
readme_directory = no
recipient_delimiter = +
relayhost =
smtp_connect_timeout = 3
smtp_connection_cache_on_demand = no
smtp_connection_cache_time_limit = 16s
smtp_connection_reuse_time_limit = 16s
smtp_discard_ehlo_keyword_address_maps = hash:/etc/postfix/busted-
servers
smtp_helo_timeout = 30
smtp_tls_CAfile = /etc/ssl/certs/ca-certificates.crt
smtp_tls_cert_file = /etc/ssl/certs/ssl-cert-snakeoil.pem
smtp_tls_enforce_peername = no
smtp_tls_key_file = /etc/ssl/private/ssl-cert-snakeoil.key
smtp_tls_loglevel = 0
smtp_tls_mandatory_ciphers = low
smtp_tls_security_level = may
smtp_tls_session_cache_timeout = 0
smtpd_banner = $myhostname ESMTP $mail_name (Debian/GNU)
smtpd_client_connection_count_limit = 8000
smtpd_milters = inet:127.0.0.1:12301
smtpd_tls_cert_file = /etc/ssl/certs/ssl-cert-snakeoil.pem
smtpd_tls_key_file = /etc/ssl/private/ssl-cert-snakeoil.key
smtpd_tls_session_cache_database = btree:${data_directory}/smtpd_scache
smtpd_use_tls = no
tls_random_source = dev:/dev/urandom
transport_maps = regexp:/etc/postfix/transport tcp:[127.0.0.1]:23478



--
Christopher R. Gabriel - Senior System Architect

Keyman Technologies Srl - Business Center
Via Conca del Naviglio 18, 20123 Milan - Italy
Phone +39 0287343803
Mobile +39 391 3459671


This message is for the designated recipient only and may contain
privileged or confidential information.
If you have received it in error, please notify the sender immediately
and delete the original.
Any other use of the email by you is prohibited.

Reply | Threaded
Open this post in threaded view
|

Re: Slowness after upgrading from postfix 2.x to 3.1.8

Viktor Dukhovni
> On Jan 4, 2019, at 9:23 AM, Christopher R. Gabriel <[hidden email]> wrote:
>
> Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type E
> len 0 data
> Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]:
> vstream_fflush_some: fd 18 flush 2433
> Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]:
> vstream_buf_get_ready: fd 18 got 18
> Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: public/cleanup
> socket: wanted attribute: status

Your "cleanup" service is responding slowly.  Perhaps your filesystem
is slow, since cleanup has to commit the message to disk.  Or some
tables you're using in cleanup are slow.  The verbose logging does
not help, it puts pressure on the filesystem...

--
        Viktor.

Reply | Threaded
Open this post in threaded view
|

Re: Slowness after upgrading from postfix 2.x to 3.1.8

Christopher R. Gabriel
On Fri, 2019-01-04 at 09:49 -0500, Viktor Dukhovni wrote:

> > On Jan 4, 2019, at 9:23 AM, Christopher R. Gabriel <
> > [hidden email]> wrote:
> >
> > Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]: rec_put: type
> > E
> > len 0 data
> > Nov 30 09:11:31 postfix01 postfix-main/smtpd[31800]:
> > vstream_fflush_some: fd 18 flush 2433
> > Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]:
> > vstream_buf_get_ready: fd 18 got 18
> > Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: public/cleanup
> > socket: wanted attribute: status
>
> Your "cleanup" service is responding slowly.  Perhaps your filesystem
> is slow, since cleanup has to commit the message to disk.

All spool/data dirs are on ramdisk tmpfs, that's why I'm wondering.

>  Or some tables you're using in cleanup are slow.

I only have a header_checks table with 1 single rule to log a specific
header, and a transport map redis-based. Exactly the same configuration
I have on postfix 2.x.

>  The verbose logging does not help, it puts pressure on the
> filesystem...

I've enable the verbose logging only for debug pourposes, I find the
same behaviour even with logging disabled.

Reply | Threaded
Open this post in threaded view
|

Re: Slowness after upgrading from postfix 2.x to 3.1.8

Viktor Dukhovni


> On Jan 4, 2019, at 10:04 AM, Christopher R. Gabriel <[hidden email]> wrote:
>
>> Or some tables you're using in cleanup are slow.
>
> I only have a header_checks table with 1 single rule to log a specific
> header, and a transport map redis-based. Exactly the same configuration
> I have on postfix 2.x.

Cleanup does perform transport lookups.  Is redis performing well?

>> The verbose logging does not help, it puts pressure on the
>> filesystem...
>
> I've enable the verbose logging only for debug pourposes, I find the
> same behaviour even with logging disabled.

Well, since cleanup is slow, move the verbose logging to cleanup if
you can't determine why cleanup is blocked more directly.

Cleanup performs transport lookups, canonical and virtual rewriting,
header/body checks, writes and syncs the message to disk, and
sends a 1-byte notice to the queue manager socket.  With inflow_delay
it also waits that long to receive a token from the queue manager.

One of these things is slow.

--
        Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: Slowness after upgrading from postfix 2.x to 3.1.8

Christopher R. Gabriel
On Fri, 2019-01-04 at 10:26 -0500, Viktor Dukhovni wrote:

> > On Jan 4, 2019, at 10:04 AM, Christopher R. Gabriel <
> > [hidden email]> wrote:
> >
> > > Or some tables you're using in cleanup are slow.
> >
> > I only have a header_checks table with 1 single rule to log a
> > specific
> > header, and a transport map redis-based. Exactly the same
> > configuration
> > I have on postfix 2.x.
>
> Cleanup does perform transport lookups.  Is redis performing well?

Yes, blazing fast. It's shown in the full log of the smtp session, al
the whole transaction (until the reported slowness event) is immediate.

> > > The verbose logging does not help, it puts pressure on the
> > > filesystem...
> >
> > I've enable the verbose logging only for debug pourposes, I find
> > the
> > same behaviour even with logging disabled.
>
> Well, since cleanup is slow, move the verbose logging to cleanup if
> you can't determine why cleanup is blocked more directly.

I'll try this too.

> Cleanup performs transport lookups, canonical and virtual rewriting,
> header/body checks, writes and syncs the message to disk, and
> sends a 1-byte notice to the queue manager socket.  With inflow_delay
> it also waits that long to receive a token from the queue manager.

in_flow_delay is default, like in the other postfix 2.x instances, all
data (both new and old server) is on ramdisk.. I'll dig into the
verbose log for cleanup, hoping to find the problem

Reply | Threaded
Open this post in threaded view
|

Re: Slowness after upgrading from postfix 2.x to 3.1.8

Matus UHLAR - fantomas
In reply to this post by Christopher R. Gabriel
On 04.01.19 15:23, Christopher R. Gabriel wrote:
>I have a generator server which injects (via smtp) into postfix, the
>actual sender, and when burst of delivery happens, the receiving
>postfix stuck before answering to the generator, which causes the
>generator queues to fill up.

>Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: abort all milters
>Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: milter8_abort:
>abort milter inet:127.0.0.1:12301


>postfix01 data/spool are on tmpfs.

are you OK with losing mail when something breaks?

>smtpd_milters = inet:127.0.0.1:12301

is this milter running properly?

--
Matus UHLAR - fantomas, [hidden email] ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Spam is for losers who can't get business any other way.
Reply | Threaded
Open this post in threaded view
|

Re: Slowness after upgrading from postfix 2.x to 3.1.8

Benny Pedersen-2
In reply to this post by Christopher R. Gabriel
Christopher R. Gabriel skrev den 2019-01-04 15:23:

> postfix01 data/spool are on tmpfs.

its unsafe to use tmpfs for spool dirs in postfix, tmpfs is okay only
for content-filters, not in generic postfix
Reply | Threaded
Open this post in threaded view
|

Re: Slowness after upgrading from postfix 2.x to 3.1.8

Christopher R. Gabriel
In reply to this post by Matus UHLAR - fantomas
On Fri, 2019-01-04 at 19:56 +0100, Matus UHLAR - fantomas wrote:

> On 04.01.19 15:23, Christopher R. Gabriel wrote:
> > I have a generator server which injects (via smtp) into postfix,
> > the
> > actual sender, and when burst of delivery happens, the receiving
> > postfix stuck before answering to the generator, which causes the
> > generator queues to fill up.
> > Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: abort all
> > milters
> > Nov 30 09:11:58 postfix01 postfix-main/smtpd[31800]: milter8_abort:
> > abort milter inet:127.0.0.1:12301
> > postfix01 data/spool are on tmpfs.
>
> are you OK with losing mail when something breaks?

Yes

> > smtpd_milters = inet:127.0.0.1:12301
>
> is this milter running properly?

That's opendkim. No error or strange behaviour reported.


Regards,

Christopher