Connection Caching Per-Destination

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Connection Caching Per-Destination

Greg Sims
We are seeing: "has exceeded the maximum number of connections" in our
logs for domains associated with outlook.com.  We have a transport
named "outlook:" in transport.regexp as follows:

# outlook.com domains
#
/@outlook(\.[a-z]{2,3}){1,2}$/  outlook:
/@hotmail(\.[a-z]{2,3}){1,2}$/  outlook:
/@live(\.[a-z]{2,3}){1,2}$/        outlook:
/@msn\.com$/                        outlook:

main.cf is configured as follows:

outlook_destination_concurrency_limit = 6
outlook_destination_concurrency_failed_cohort_limit = 100
outlook_destination_concurrency_positive_feedback = 1/3
outlook_destination_concurrency_negative_feedback = 1/8

This transport is configured as follows in master.cf:

outlook  unix  -       -       n       -       6       smtp
  -o syslog_name=outlook

We can control the number of "has exceeded the maximum number of
connections" messages we see by limiting the number of processes in
master.cf.

We would like to use Per-Destination Connection Caching to increase
our throughput for "outlook:".  Our mail server does not specify
"relayhost =" in main.cf.  Is it possible to associate per-destination
caching with the "outlook:" transport?  If not, what is the best
alternative?

If the answer is something like the following in main.cf:

smtp_connection_cache_destinations = hotmail.com, hotmail.es,
hotmail.co.uk, outlook.com, outlook.es, live.com, msn.com

should we try to include All the domains associated with "outlook:" so
even small volume domains are not counted as connections by
outlook.com servers?  If this is the case, should/can we point
"smtp_connection_cache_destinations =" to a regexp file?

We are not seeing "conn_use=" in our logs.  Is it true that Postfix
will not log "conn_use=" for current releases?  We are running
postfix.x86_64 2:3.3.1-12.el8.

Thanks, Greg
www.RayStedman.org
Reply | Threaded
Open this post in threaded view
|

Re: Connection Caching Per-Destination

Wietse Venema
Greg Sims:
> We are seeing: "has exceeded the maximum number of connections" in our
> logs for domains associated with outlook.com.  We have a transport
> named "outlook:" in transport.regexp as follows:
...
> This transport is configured as follows in master.cf:
>
> outlook  unix  -       -       n       -       6       smtp
>   -o syslog_name=outlook

Are you using this for all outlook email, or only for [hidden email]?

Note: this is a trick question.

> We would like to use Per-Destination Connection Caching to increase
> our throughput for "outlook:".  Our mail server does not specify
> "relayhost =" in main.cf.  Is it possible to associate per-destination
> caching with the "outlook:" transport?  If not, what is the best
> alternative?

Did you mean:

master.cf:
    outlook  unix  -       -       n       -       6       smtp
       -o syslog_name=outlook
       -o smtp_connection_cache_on_demand=yes
       -o smtp_tls_connection_reuse=yes

It's easier to do the reuse configuration in main.cf.

main.cf:
    smtp_connection_cache_on_demand=yes
    smtp_tls_connection_reuse=yes

Note that Postfix has "smtp_connection_cache_on_demand = yes" by
default, but for TLS you have to turn it on because that is still
relatively new. Sofar there haven't been problems with reusong
TLS encryted connections.

> If the answer is something like the following in main.cf:
>
> smtp_connection_cache_destinations = hotmail.com, hotmail.es,
> hotmail.co.uk, outlook.com, outlook.es, live.com, msn.com

You don't want to do that unless you must exclude some destinations.

> should we try to include All the domains associated with "outlook:" so
> even small volume domains are not counted as connections by
> outlook.com servers?  If this is the case, should/can we point
> "smtp_connection_cache_destinations =" to a regexp file?
>
> We are not seeing "conn_use=" in our logs.  Is it true that Postfix
> will not log "conn_use=" for current releases?  We are running
> postfix.x86_64 2:3.3.1-12.el8.

See above. You need to turn on smtp_tls_connection_reuse.

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: Connection Caching Per-Destination

Viktor Dukhovni
In reply to this post by Greg Sims
On Thu, Jul 30, 2020 at 10:58:20AM -0700, Greg Sims wrote:

> We are seeing: "has exceeded the maximum number of connections" in our
> logs for domains associated with outlook.com.  We have a transport
> named "outlook:" in transport.regexp as follows:
>
> # outlook.com domains
> #
> /@outlook(\.[a-z]{2,3}){1,2}$/  outlook:
> /@hotmail(\.[a-z]{2,3}){1,2}$/  outlook:
> /@live(\.[a-z]{2,3}){1,2}$/        outlook:
> /@msn\.com$/                        outlook:

Fine.

> main.cf is configured as follows:
>
> outlook_destination_concurrency_limit = 6
> outlook_destination_concurrency_failed_cohort_limit = 100
> outlook_destination_concurrency_positive_feedback = 1/3
> outlook_destination_concurrency_negative_feedback = 1/8

OK.

> This transport is configured as follows in master.cf:
>
> outlook  unix  -       -       n       -       6       smtp
>   -o syslog_name=outlook

Fine.

> We can control the number of "has exceeded the maximum number of
> connections" messages we see by limiting the number of processes in
> master.cf.

Sure.

> We would like to use Per-Destination Connection Caching to increase
> our throughput for "outlook:".

No, you *do not* want to do that.  That can increase connection
concurrency beyond your process limit, in the form of idle connections
that have a different nexthop than the one to which you're currently
delivering email.

Instead, you want to *disable* even demand connection caching.

> Our mail server does not specify
> "relayhost =" in main.cf.  Is it possible to associate per-destination
> caching with the "outlook:" transport?  If not, what is the best
> alternative?

The question is moot.  You DO NOT want to do this.

> smtp_connection_cache_destinations = hotmail.com, hotmail.es,
> hotmail.co.uk, outlook.com, outlook.es, live.com, msn.com

See above.

> should we try to include All the domains associated with "outlook:" so
> even small volume domains are not counted as connections by
> outlook.com servers?  If this is the case, should/can we point
> "smtp_connection_cache_destinations =" to a regexp file?

You DO NOT want to cache connections.  But you may want to route more
domains to this transport that you know are handled by Microsoft's
mail hosts.

The right solution is to get whitelisted by Microsoft, because you're
sending content their users want.

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: Connection Caching Per-Destination

Greg Sims
> > We would like to use Per-Destination Connection Caching to increase
> > our throughput for "outlook:".
>
> No, you *do not* want to do that.  That can increase connection
> concurrency beyond your process limit, in the form of idle connections
> that have a different nexthop than the one to which you're currently
> delivering email.
>
> Instead, you want to *disable* even demand connection caching.

I updated master.cf based on your recommendation:

outlook  unix  -       -       n       -       6       smtp
  -o syslog_name=outlook
  -o smtp_connection_cache_on_demand=no

We have our ip addresses signed up for both SNDS and JMRP. Are there
additional white list strategies for Microsoft?

Turning off connection caching is not intuitive from reading the
CONNECTION_CACHE_README which says:

SMTP Connection caching can also help with receivers that impose rate
limits on new connections.

and suggests:

smtp_connection_cache_destinations = hotmail.com, ...

Perhaps I wanted the connection cache to be the solution for "has
exceeded the maximum number of connections" when I read the README.

Thanks, Greg
www.RayStedman.org

On Thu, Jul 30, 2020 at 3:52 PM Viktor Dukhovni
<[hidden email]> wrote:

>
> On Thu, Jul 30, 2020 at 10:58:20AM -0700, Greg Sims wrote:
>
> > We are seeing: "has exceeded the maximum number of connections" in our
> > logs for domains associated with outlook.com.  We have a transport
> > named "outlook:" in transport.regexp as follows:
> >
> > # outlook.com domains
> > #
> > /@outlook(\.[a-z]{2,3}){1,2}$/  outlook:
> > /@hotmail(\.[a-z]{2,3}){1,2}$/  outlook:
> > /@live(\.[a-z]{2,3}){1,2}$/        outlook:
> > /@msn\.com$/                        outlook:
>
> Fine.
>
> > main.cf is configured as follows:
> >
> > outlook_destination_concurrency_limit = 6
> > outlook_destination_concurrency_failed_cohort_limit = 100
> > outlook_destination_concurrency_positive_feedback = 1/3
> > outlook_destination_concurrency_negative_feedback = 1/8
>
> OK.
>
> > This transport is configured as follows in master.cf:
> >
> > outlook  unix  -       -       n       -       6       smtp
> >   -o syslog_name=outlook
>
> Fine.
>
> > We can control the number of "has exceeded the maximum number of
> > connections" messages we see by limiting the number of processes in
> > master.cf.
>
> Sure.
>
> > We would like to use Per-Destination Connection Caching to increase
> > our throughput for "outlook:".
>
> No, you *do not* want to do that.  That can increase connection
> concurrency beyond your process limit, in the form of idle connections
> that have a different nexthop than the one to which you're currently
> delivering email.
>
> Instead, you want to *disable* even demand connection caching.
>
> > Our mail server does not specify
> > "relayhost =" in main.cf.  Is it possible to associate per-destination
> > caching with the "outlook:" transport?  If not, what is the best
> > alternative?
>
> The question is moot.  You DO NOT want to do this.
>
> > smtp_connection_cache_destinations = hotmail.com, hotmail.es,
> > hotmail.co.uk, outlook.com, outlook.es, live.com, msn.com
>
> See above.
>
> > should we try to include All the domains associated with "outlook:" so
> > even small volume domains are not counted as connections by
> > outlook.com servers?  If this is the case, should/can we point
> > "smtp_connection_cache_destinations =" to a regexp file?
>
> You DO NOT want to cache connections.  But you may want to route more
> domains to this transport that you know are handled by Microsoft's
> mail hosts.
>
> The right solution is to get whitelisted by Microsoft, because you're
> sending content their users want.
>
> --
>     Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: Connection Caching Per-Destination

Viktor Dukhovni
On Thu, Jul 30, 2020 at 09:49:07PM -0700, Greg Sims wrote:

> > Instead, you want to *disable* even demand connection caching.
>
> I updated master.cf based on your recommendation:
>
> outlook  unix  -       -       n       -       6       smtp
>   -o syslog_name=outlook
>   -o smtp_connection_cache_on_demand=no
>
> We have our ip addresses signed up for both SNDS and JMRP. Are there
> additional white list strategies for Microsoft?

I am not a bulk-mail sender, so I don't know.

> Turning off connection caching is not intuitive from reading the
> CONNECTION_CACHE_README which says:
>
> SMTP Connection caching can also help with receivers that impose rate
> limits on new connections.

Well, *rate* limits != concurrency limits.  Are they complaining about
too many connections at the same time, or too many connections per unit
time?  I understood the issue to be too many *concurrent* connections.
If it is connections per unit time, then indeed connection caching could
help, but only at the cost of increasing concurrency.  Pick your poison.
Better yet, get your limits raised (I don't know how, but there ought
to be a way).

> Perhaps I wanted the connection cache to be the solution for "has
> exceeded the maximum number of connections" when I read the README.

Well is it maximum (concurrent) *number* or maximum rate?

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: Connection Caching Per-Destination

Wietse Venema
In reply to this post by Viktor Dukhovni
Viktor Dukhovni:
> > We would like to use Per-Destination Connection Caching to increase
> > our throughput for "outlook:".
>
> No, you *do not* want to do that.  That can increase connection
> concurrency beyond your process limit, in the form of idle connections
> that have a different nexthop than the one to which you're currently
> delivering email.

We could fix these excess connections by grouping cached connections
by transport, and by evicting a cached connection for some transport
if a requested connection for that transport is not found.

It's not optimal from a reuse point of view, but it ensures that
one transport cannot exceed the number of connections determined
by its process limit.

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: Connection Caching Per-Destination

Viktor Dukhovni
> On Jul 31, 2020, at 12:33 PM, Wietse Venema <[hidden email]> wrote:
>
>> No, you *do not* want to do that.  That can increase connection
>> concurrency beyond your process limit, in the form of idle connections
>> that have a different nexthop than the one to which you're currently
>> delivering email.
>
> We could fix these excess connections by grouping cached connections
> by transport, and by evicting a cached connection for some transport
> if a requested connection for that transport is not found.
>
> It's not optimal from a reuse point of view, but it ensures that
> one transport cannot exceed the number of connections determined
> by its process limit.

This simple eviction policy may be too aggressive, if two destinations
are each creating new cached connections, each may evict the cached
connection(s) for the other.

There should likely be a configurable per-transport limit on the
connection cache occupancy, so that eviction only happens when
the limit is reached.  The default limit might be a small multiple
of the transport destination concurrency, and users can raise or
lower that as needed on per-transport basis.

--
        Viktor.

Reply | Threaded
Open this post in threaded view
|

Re: Connection Caching Per-Destination

Greg Sims
The situation with outlook got much worse in our overnight runs.  We
transferred 7K subscriber emails to relays ending in outlook.com and
saw the following feedback in our logs:

MaxConnections: 83, Connection: 1386, RateLimited: 6392

where the following regexp is used in our log post-processor:

MaxConnection -- "^.*: to=<.*>.* said: 451 4.7.652 The mail server .*
has exceeded the maximum number of connections.*$"

Connection -- "^.*: lost connection with.* while sending RCPT TO.*$"
(and the like)

RateLimited -- "^.*The mail server .* has been temporarily rate
limited due to IP reputation.*$"

We made three changes to our configuration yesterday:

(1) smtpd_tls_security_level = none & smtp_tls_security_level  = none
in main.cf as we do not need TLS and do not have it configured.  We
are now seeing "conn_use=" in our logs for the first time.

(2) outlook  unix  -       -       n       -       6       smtp
         -o syslog_name=outlook
         -o smtp_connection_cache_on_demand=no

(3) we increased our email arrival rate from 500 to 1000 over the past
two days.  this is likely a primary factor.

I looked for domains that *are not* using the outlook: transport but
are using the outlook.com relay servers.  There are 383 such domains
-- the vast majority are one email address per domain.  These domains
are competing for the limited number of outlook.com connections and
they are not being controlled by the outlook: transport process limit
in master.cf.  Adding 383 domains to outlook: in transport.regexp
seems a bit extreme and would be impossible to maintain.  How can we
control the number of connections made on behalf of this set of
domains to the outlook.com relay servers?

My solution, without additional input, is to reduce our email arrival
rate from 1000 to 500 emails per minute.  I will also reduce the
outlook: processes to 2 in master.cf and
"outlook_destination_concurrency_limit = 2" in main.cf in hopes this
will minimize the feedback log messages we are seeing from outlook
relay servers.  This "solution" is Very constraining.   The google.com
relay servers can transfer 10,000 emails per minute without a single
feedback message in the logs.  This "solution" is limiting the
delivery rate of ALL domains at the expense of the outlook.com
connection limitations.  I hope there is a better solution!

Thanks, Greg
www.RayStedman.org
Reply | Threaded
Open this post in threaded view
|

Re: Connection Caching Per-Destination

Wietse Venema
In reply to this post by Viktor Dukhovni
Viktor Dukhovni:

> > On Jul 31, 2020, at 12:33 PM, Wietse Venema <[hidden email]> wrote:
> >
> >> No, you *do not* want to do that. That can increase connection
> >> concurrency beyond your process limit, in the form of idle connections
> >> that have a different nexthop than the one to which you're currently
> >> delivering email.
> >
> > We could fix these excess connections by grouping cached connections
> > by transport, and by evicting a cached connection for some transport
> > if a requested connection for that transport is not found.
> >
> > It's not optimal from a reuse point of view, but it ensures that
> > one transport cannot exceed the number of connections determined
> > by its process limit.
>
> This simple eviction policy may be too aggressive, if two destinations
> are each creating new cached connections, each may evict the cached
> connection(s) for the other.

This depends on the number of destinations and the process limit.

If the process limit is smaller than the number of destinations,
like 1 versus N, then N competing destinations will evict each
other's cached connection with high likelihood. And they should,
to respect the connection concurrency limit.

If the process limit is similar to the number of destinations, then
some connections will be reused. If the process limit is much larger,
then more connection reuse will happen.

Note that Postix reuses connections not only based on destination
name but also based on destination IP address (not sure if that
is still the case with verified TLS). Lookup by IP address improves
reuse when multiple outlook destinations share a pool of MX hosts.
But cache lookup by IP address results in multiple cache queries
per delivery request, and that should not result in multiple cache
evictions per delivery request. Thus, my initial suggestion would
need some refinement.

> There should likely be a configurable per-transport limit on the
> connection cache occupancy, so that eviction only happens when
> the limit is reached. The default limit might be a small multiple
> of the transport destination concurrency, and users can raise or
> lower that as needed on per-transport basis.

Without destructive connection cache lookups, the connection
concurrency can be as large as cache_occupancy_limit + process_limit.
That makes the approach unreliable.

With destructive connection cache lookups, the connection concurrency
is guaranteed no larger than the process limit (a connection is either
reused or it is destroyed before an SMTP client creates a new
connection), and the cache does not need to know about process
limits.

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: Connection Caching Per-Destination

Viktor Dukhovni
In reply to this post by Greg Sims
On Fri, Jul 31, 2020 at 09:37:12AM -0700, Greg Sims wrote:

> RateLimited -- "^.*The mail server .* has been temporarily rate
> limited due to IP reputation.*$"

There's your problem.  You need a better IP reputation.

> (1) smtpd_tls_security_level = none & smtp_tls_security_level  = none
> in main.cf as we do not need TLS and do not have it configured.  We
> are now seeing "conn_use=" in our logs for the first time.
>
> (2) outlook  unix  -       -       n       -       6       smtp
>          -o syslog_name=outlook
>          -o smtp_connection_cache_on_demand=no

With demand caching disabled, you should not see "conn_use=2+".

> I looked for domains that *are not* using the outlook: transport but
> are using the outlook.com relay servers.  There are 383 such domains
> -- the vast majority are one email address per domain.  These domains
> are competing for the limited number of outlook.com connections and
> they are not being controlled by the outlook: transport process limit
> in master.cf.  Adding 383 domains to outlook: in transport.regexp
> seems a bit extreme and would be impossible to maintain.  How can we
> control the number of connections made on behalf of this set of
> domains to the outlook.com relay servers?

Postfix can't do that natively.  If all your messages are
single-recipient, then you may be able to use:

  main.cf:
    indexed = ${default_database_type}:${config_directory}/
    smtpd_recipient_restrictions =
        ...
        check_recipient_mx_access ${indexed}mx-access

  mx-access
    outlook.com     FILTER outlook
    .outlook.com    FILTER outlook

to route email for all recipient domains that use an outlook.com MX host
to the "outlook" transport.

Your real problem is however your IP reputation.  If you're sending
unsolicited email, or you have relay customers sending unsolicited mail,
then your difficulties delivering it are a desirable feature of
Microsoft's email service.  If you're sending email outlook.com
customers want, then you should be able to work with Microsoft to
resolve the obstacles (they should be willing to whitelist your
address).

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: Connection Caching Per-Destination

Wietse Venema
In reply to this post by Greg Sims
Greg Sims:

> The situation with outlook got much worse in our overnight runs.  We
> transferred 7K subscriber emails to relays ending in outlook.com and
> saw the following feedback in our logs:
>
> MaxConnections: 83, Connection: 1386, RateLimited: 6392
>
> where the following regexp is used in our log post-processor:
>
> MaxConnection -- "^.*: to=<.*>.* said: 451 4.7.652 The mail server .*
> has exceeded the maximum number of connections.*$"
>
> Connection -- "^.*: lost connection with.* while sending RCPT TO.*$"
> (and the like)
>
> RateLimited -- "^.*The mail server .* has been temporarily rate
> limited due to IP reputation.*$"

Have you ever figured out if the initial problem is *concurrency*
or *connection rate* based? (They may rate limit because of an
earlier concurrency violation).

> We made three changes to our configuration yesterday:
>
> (1) smtpd_tls_security_level = none & smtp_tls_security_level  = none
> in main.cf as we do not need TLS and do not have it configured.  We
> are now seeing "conn_use=" in our logs for the first time.
>
> (2) outlook  unix  -       -       n       -       6       smtp
>          -o syslog_name=outlook
>          -o smtp_connection_cache_on_demand=no
>
> (3) we increased our email arrival rate from 500 to 1000 over the past
> two days.  this is likely a primary factor.
>
> I looked for domains that *are not* using the outlook: transport but
> are using the outlook.com relay servers.  There are 383 such domains
> -- the vast majority are one email address per domain.  These domains
> are competing for the limited number of outlook.com connections and
> they are not being controlled by the outlook: transport process limit
> in master.cf.  Adding 383 domains to outlook: in transport.regexp
> seems a bit extreme and would be impossible to maintain.  How can we
> control the number of connections made on behalf of this set of
> domains to the outlook.com relay servers?

With automated logfile analysis, such domains could be added to a
transport map. Once a map is populated there will be a trickle of
updates.

There is a crude way to automatically group messages by destination
MX hosts, but thath works only for the special case that all messages
have exactly one recipient or all recipients in the same domain.

/etc/postfix/main.cf:
    check_recipient_mx_access pcre:/etc/postfix/mx_access

/etc/postfix/mx_access:
    /\.outlook\.com$/ FILTER outlook:
    # other patterns...

That will send a message to outlook if any MX looks like outlook.

> My solution, without additional input, is to reduce our email arrival
> rate from 1000 to 500 emails per minute.  I will also reduce the
> outlook: processes to 2 in master.cf and
> "outlook_destination_concurrency_limit = 2" in main.cf in hopes this
> will minimize the feedback log messages we are seeing from outlook
> relay servers.  This "solution" is Very constraining.   The google.com
> relay servers can transfer 10,000 emails per minute without a single
> feedback message in the logs.  This "solution" is limiting the
> delivery rate of ALL domains at the expense of the outlook.com
> connection limitations.  I hope there is a better solution!

The better solution is to be whitelisted at outlook.com.

Otherwise, to send 10,000 emails per minute to other sites, you
need a transport map to group the outlook-based domains.

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: Connection Caching Per-Destination

Greg Sims
In reply to this post by Viktor Dukhovni
> Your real problem is however your IP reputation.  If you're sending
> unsolicited email, or you have relay customers sending unsolicited mail,
> then your difficulties delivering it are a desirable feature of
> Microsoft's email service.  If you're sending email outlook.com
> customers want, then you should be able to work with Microsoft to
> resolve the obstacles (they should be willing to whitelist your
> address).

We send a Bible Daily Devotion email to 30K double opted in
subscribers each morning. We grew to this size over 10 years. We use
SPF, DKIM and DMARC to protect this email. We remove subscribers based
on bounced email and participate in the ISP Feedback Loops.  Before
creating the new server (used exclusively for sending devotion email),
we could only deliver 300 emails per minute.  It takes almost two
hours to deliver our email at this rate and our subscriptions continue
to grow.  We send two emails to each subscriber on the 1st of every
month (tomorrow) which takes almost four hours at 300 per minute.

Microsoft SNDS shows our ip addresses as "Green" -- no issues.  I will
try to figure out how to communicate our needs to outlook with the
hope of being placed on a whitelist.

Thanks, Greg
www.RayStedman.org
Reply | Threaded
Open this post in threaded view
|

Re: Connection Caching Per-Destination

@lbutlr
In reply to this post by Wietse Venema
On 30 Jul 2020, at 12:53, Wietse Venema <[hidden email]> wrote:
> main.cf:
>    smtp_connection_cache_on_demand=yes
>    smtp_tls_connection_reuse=yes

Do these setting show up in anyway int he logs (that is, does the log look any different if a TLS connection is reused or a connection is using cache_on_demand.

--
There are strange things done in the midnight sun/By the men who moil
for gold; The Arctic trails have their secret tales/That would
make your blood run cold; The Northern Lights have seen queer
sights,/But the queerest they ever did see Was the night on the
marge of Lake Lebarge/ When I cremated Sam McGee
Reply | Threaded
Open this post in threaded view
|

Re: Connection Caching Per-Destination

Viktor Dukhovni
On Fri, Jul 31, 2020 at 11:47:57AM -0600, @lbutlr wrote:
> On 30 Jul 2020, at 12:53, Wietse Venema <[hidden email]> wrote:
> > main.cf:
> >    smtp_connection_cache_on_demand=yes
> >    smtp_tls_connection_reuse=yes
>
> Do these setting show up in anyway int he logs (that is, does the log look any different if a TLS connection is reused or a connection is using cache_on_demand.

Searches on relevant terms quickly lead to:

    http://www.postfix.org/CONNECTION_CACHE_README.html

specifically:

    http://www.postfix.org/CONNECTION_CACHE_README.html#safety

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: Connection Caching Per-Destination

Wietse Venema
In reply to this post by @lbutlr
@lbutlr:
> On 30 Jul 2020, at 12:53, Wietse Venema <[hidden email]> wrote:
> > main.cf:
> >    smtp_connection_cache_on_demand=yes

Logged as conn_use=xxx. By default, reuse happens only for plaintext
connections.

> >    smtp_tls_connection_reuse=yes

Logged as TLS handshake results plus conn_use=xxx.

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: Connection Caching Per-Destination

Viktor Dukhovni
On Fri, Jul 31, 2020 at 02:16:54PM -0400, Wietse Venema wrote:

> Logged as conn_use=xxx. By default, reuse happens only for plaintext
> connections.
>
> > >    smtp_tls_connection_reuse=yes
>
> Logged as TLS handshake results plus conn_use=xxx.

One thing we could likely improve in TLS connection reuse logging is
logging of an appropriate client session identifier in tlsproxy(8) TLS
log entries:

    Jul 21 01:16:57 amnesiac postfix/tlsproxy[64244]:
        Verified TLS connection established
        to amnesiac.example[192.0.2.1]:25:
        TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
        key-exchange X25519 server-signature RSA-PSS (2048 bits)
        server-digest SHA256

It is presently difficult to correlate logging in tlsproxy(8) with
a particular smtp(8) client's delivery attempts.

We should perhaps have a field in the TLS_SESS_STATE (TLScontext
variable) that represents a client id for proxy connections, allowing
tools like "collate" to grooup relevant logging by tlsproxy(8) with the
other logging relevant to a given delivery.

One possibility would be:

    Jul 21 01:16:57 amnesiac postfix/tlsproxy[64244]:
-->     QUEUE-ID: smtp[PID]:
        Verified TLS connection established
        to amnesiac.example[192.0.2.1]:25:
        TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
        key-exchange X25519 server-signature RSA-PSS (2048 bits)
        server-digest SHA256

So that we identify both the message, but the associated delivery agent
process.  We might then also include the queue-id (but not repeat the
process id) for other TLS library log messages (warnings and perhaps
debug messages at log levels != 1).

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: Connection Caching Per-Destination

Wietse Venema
Viktor Dukhovni:

> On Fri, Jul 31, 2020 at 02:16:54PM -0400, Wietse Venema wrote:
>
> > Logged as conn_use=xxx. By default, reuse happens only for plaintext
> > connections.
> >
> > > >    smtp_tls_connection_reuse=yes
> >
> > Logged as TLS handshake results plus conn_use=xxx.
>
> One thing we could likely improve in TLS connection reuse logging is
> logging of an appropriate client session identifier in tlsproxy(8) TLS
> log entries:
>
>     Jul 21 01:16:57 amnesiac postfix/tlsproxy[64244]:
>         Verified TLS connection established
>         to amnesiac.example[192.0.2.1]:25:
>         TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
>         key-exchange X25519 server-signature RSA-PSS (2048 bits)
>         server-digest SHA256
>
> It is presently difficult to correlate logging in tlsproxy(8) with
> a particular smtp(8) client's delivery attempts.
>
> We should perhaps have a field in the TLS_SESS_STATE (TLScontext
> variable) that represents a client id for proxy connections, allowing
> tools like "collate" to grooup relevant logging by tlsproxy(8) with the
> other logging relevant to a given delivery.
>
> One possibility would be:
>
>     Jul 21 01:16:57 amnesiac postfix/tlsproxy[64244]:
> -->     QUEUE-ID: smtp[PID]:
>         Verified TLS connection established
>         to amnesiac.example[192.0.2.1]:25:
>         TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
>         key-exchange X25519 server-signature RSA-PSS (2048 bits)
>         server-digest SHA256
>
> So that we identify both the message, but the associated delivery agent
> process.  We might then also include the queue-id (but not repeat the
> process id) for other TLS library log messages (warnings and perhaps
> debug messages at log levels != 1).

Logging the (long) queue ID would be safe and sufficient to
disambiguate. Logging the client PID would not be sufficient to
disambiguate.

As for debug logging from multiserver programs such as tlsproxy,
the Postfix TLS library already logs the TCP-level connection info.

On the other hand, the low-level Postfix libraries do not identify
application context in debug logging, and logging such information
would involve architectural changes.

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: Connection Caching Per-Destination

Greg Sims
> > I looked for domains that *are not* using the outlook: transport but
> > are using the outlook.com relay servers.  There are 383 such domains
> > -- the vast majority are one email address per domain.  These domains
> > are competing for the limited number of outlook.com connections and
> > they are not being controlled by the outlook: transport process limit
> > in master.cf.  Adding 383 domains to outlook: in transport.regexp
> > seems a bit extreme and would be impossible to maintain.  How can we
> > control the number of connections made on behalf of this set of
> > domains to the outlook.com relay servers?
>
> With automated logfile analysis, such domains could be added to a
> transport map. Once a map is populated there will be a trickle of
> updates.

I created transport.outlook.regexp which contains the 383 domains
mentioned above.  I also added the following in main.cf:

transport_maps = regexp:/etc/postfix/transport.regexp,
regexp:/etc/postfix/transport.outlook.regexp

Only one domain was *not* using the outlook: transport last night (now
added to transport.outlook.regexp).  The overnight runs were much
better as a result of this and limiting our email arrival rate to 500
per minute.  We received the following messages with the outlook:
transport limited to 4 processes in master.cf:

MaxConnections: 54, Connection: 30

Not perfect but much better and no rate limiting.

What is the relationship between the number of processes for outlook:
transport in master.cf and the number of simultaneous connections that
can be made to the outlook.com servers?  I changed master.cf to 3
processes for outlook: in hopes of reducing MaxConnections feedback --
I can not go much smaller.

Thanks, Greg
www.RayStedman.org
Reply | Threaded
Open this post in threaded view
|

Re: Connection Caching Per-Destination

Wietse Venema
Greg Sims:

> > > I looked for domains that *are not* using the outlook: transport but
> > > are using the outlook.com relay servers.  There are 383 such domains
> > > -- the vast majority are one email address per domain.  These domains
> > > are competing for the limited number of outlook.com connections and
> > > they are not being controlled by the outlook: transport process limit
> > > in master.cf.  Adding 383 domains to outlook: in transport.regexp
> > > seems a bit extreme and would be impossible to maintain.  How can we
> > > control the number of connections made on behalf of this set of
> > > domains to the outlook.com relay servers?
> >
> > With automated logfile analysis, such domains could be added to a
> > transport map. Once a map is populated there will be a trickle of
> > updates.
>
> I created transport.outlook.regexp which contains the 383 domains
> mentioned above.  I also added the following in main.cf:
>
> transport_maps = regexp:/etc/postfix/transport.regexp,
> regexp:/etc/postfix/transport.outlook.regexp

I suppose that the trick to autmically group deliveries with
check_recipient_mx_access was not feasible. That's fine, I will
toss it into a new advanced Postfix configuration page, together
with some other advanced solutions.

> What is the relationship between the number of processes for outlook:
> transport in master.cf and the number of simultaneous connections that
> can be made to the outlook.com servers?

The number of simultanous connections equals a) the number of used
connections (one per SMTP client) plus b) the number of unused
connections waiting in the connection cache. The number a) is the
number of processes in master.cf, while the number b) varies with
the number of high-traffic destinations.

With current Postfix versions there is no way to control b), except
for making it zero by turning off connection reuse. That is unfortunate
because connection reuse has helped to improve delivery performance
for plaintext and for TLS-encrypted connections.

I proposed a simple way to keep the sum of used+unused connections
<= the process limit: when a connection can't immediately be reused,
purge some connection from the connection cache and create a new
connection. That will require new code (a new RPC from the SMTP
client to the connection cache service), and can be available in
Postfix 3.6 and later.

> I changed master.cf to 3 processes for outlook: in hopes of reducing
> MaxConnections feedback -- I can not go much smaller.

This has been asked before: when Outlook puts you in the penalty
box and starts ratelimiting your new connections, was that because
a) you exceeded a limit for the number of SIMULTANEOUS CONNECTIONS,
or b) you exceeded a limit for the number of NEW CONNECTIONS over
a time interval.

I am asking because these two scenarios have different solutions,
and three is awfully low.

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: Connection Caching Per-Destination

Greg Sims
> > I changed master.cf to 3 processes for outlook: in hopes of reducing
> > MaxConnections feedback -- I can not go much smaller.
>
> This has been asked before: when Outlook puts you in the penalty
> box and starts ratelimiting your new connections, was that because
> a) you exceeded a limit for the number of SIMULTANEOUS CONNECTIONS,
> or b) you exceeded a limit for the number of NEW CONNECTIONS over
> a time interval.
>
> I am asking because these two scenarios have different solutions,
> and three is awfully low.

First some terms with respect to outlook.com messages in the log:

RateLimited = "said: 451 4.7.650 The mail server .* has been
temporarily rate limited"

MaxConnections = "said: 451 4.7.652 The mail server .* has exceeded
the maximum number of connections"

Connection = "lost connection with.* while receiving the initial
server greeting" and the like.

I have only seen RateLimited once -- the overnight email burst that is
documented in this thread.  In the last 24 hours we relayed 14K emails
to outlook.com servers and we saw:

MaxConnections: 54, Connection: 30, RateLimited: 0

I believe outlook.com servers will cause MaxConnection and Connection
until the number of connections reaches an upper threshold.  When we
are running below the upper threshold, the email is being delivered in
an orderly fashion. I looked at a section of the logs in detail.  This
burst was 11K of emails and 3.4K are going to outlook.  The Average
Delay per email "to=<.*>.*delay=(.*)" was 194 seconds for this burst.
I record qshape every 30 seconds and see minimal incoming/active Qing
for outlook.com -- or any Qing for that matter at a 500 email/minute
arrival rate.

I do see a consistent pattern in the logs that looks like the
following (first number is the email sequence number for outlook):

    1 - Aug 01 01:43:50, outlook, delay=    1.23, delays=  0.01 /
0.01 /   0.80 /   0.41
...
1,010 - Aug 01 01:48:08, outlook, delay=  124.43, delays=  0.01 /
124.00 /   0.04 /   0.38
...
3,453 - Aug 01 01:59:59, outlook, delay=  492.86, delays=  0.01 /
492.00 /   0.31 /   0.54

The "124.00" is in delay slot b = "time from last active queue entry
to connection setup".  "delay b" started the email burst at 0.01
seconds and increased until the end of the burst when it was 492.00
seconds.  The delivery rate to outlook.com servers was 3,453
emails/969 seconds = 3.56 emails/second over this burst. Is "delay b"
Postfix internal Qing or is it being caused by outlook.com servers in
some fashion?

When outlook.com reaches the upper threshold of MaxConnections, it
starts to issue RateLimited as well.  This seems appropriate as we
were out of bounds with MaxConnections given the 383 domains we had
vying for outlook.com connections unchecked and an arrival rate of
1,000 emails per minute.

Are we in the penalty box from outlook.com?  The Microsoft SNDS data
is detecting the correct ip address/email volume and shows us as Green
-- no issues.

I know this is likely simplistic thinking -- but how about this in master.cf:

outlook  unix  -       -       n       -       -       smtp
  -o syslog_name=outlook
  -o smtp_connection_cache_on_demand=yes
  -o smtp_max_connections=8

Now the outlook smtp processes are vying for the specified number of
connections and making use of the connection_cache where possible.  If
the limiting resource for outlook.com is connections, it seems this
design might optimize throughput.  This coming from someone who has
not read the first line of Postfix code!!

Thanks, Greg
www.RayStedman.org
12