possible bottlenecks

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

possible bottlenecks

Zsombor B
Hi,


I know this is a complicated question but what/where do you see  
possible bottlenecks in postfix?
Is it CPU? RAM? Disk IO?

I'm building an infra to send out ~3-5 million emails a day.
There are no known peak periods of the day but that's also sure that  
the load will be uneven (no emails for a while then suddenly 10-100K  
mails in a very short period of time).

The plan is to start with 4 VMs and about ~10% of the planned daily  
mail amount but it will reach the planned maximum very soon.

Do you have any experience based recommendations on CPU, RAM or other  
tuning parameters?

Thanks,
Zsombor

Reply | Threaded
Open this post in threaded view
|

Re: possible bottlenecks

Viktor Dukhovni
On Wed, Oct 14, 2020 at 06:47:19AM +0200, Zsombor B wrote:

> I know this is a complicated question but what/where do you see  
> possible bottlenecks in postfix?
> Is it CPU? RAM? Disk IO?

Whatever you have least of, relative to the workload you're expecting to
process :-)  Generally speaking your limiting factor is the rate at
which downstream systems are willing to accept your mail.  But if back-
pressure is not an issue, at some point you may be able to saturate your
network, or with spinning rust do run out I/O ops per second.  CPU is
a factor if you're doing costly content inspection (anti-spam,
anti-virus, ...) at rates that saturate the available CPU resources.

More than a decade ago, I saw the queue manager as the ultimate rate
limiter at ~3k msgs/sec, with the CPU unable to process incoming
messages any faster.  CPUs have gotten faster since.

> I'm building an infra to send out ~3-5 million emails a day.

    5000000 / 86400 is ~60 msgs/sec

A single Postfix server can easily do multiple 100/sec, absent
backpressure, especially given SSD disks, or battery-backed caches in
the disk controllers.

Your problem is almost certainly downstream, especially if the
recipients are largely hosted by the giant freemail providers.

> Do you have any experience based recommendations on CPU, RAM or other  
> tuning parameters?

None of the above matters, all that matters is your ability to get
whitelisted for that traffic volume, and/or use multiple distinct sender
IPs (perhaps in multiple /20'ish netblocks or ASNs) to stay under the
radar from any given IP.  But that's non-trivial, so figure out how it
is that your email will be seen as desired by its recipients and
accepted by their mail providers without severe rate limits or outright
blocking.

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: possible bottlenecks

Wietse Venema
In reply to this post by Zsombor B
Zsombor B:
> Hi,
>
>
> I know this is a complicated question but what/where do you see  
> possible bottlenecks in postfix?
> Is it CPU? RAM? Disk IO?
>
> I'm building an infra to send out ~3-5 million emails a day.

That might have been challenging 25 years ago.  As Viktor notes,
the real challenge is to get receivers to accept your traffic at
these rates (besides not making mistakes on the sending side like
using a single-threaded submission channel).

For case studies in traffic management, see postfix-users mail from
the last 2-3 quarters about anything that involves outlook.com and
understand the recommendations that Viktor (and I) made there.

Recent examples:

- Postfix may exceed receiver concurrency limits when SMTP connection
  reuse is enabled, and one SMTP transport is used for deliveries
  to different providers. The workaround is to disable connection
  reuse for that SMTP transport, or to use a dedicated SMTP transport
  for a specific provider. The solution requires changes to Postfix's
  connection cache management protocol.

- Postfix may exceed receiver concurrency limits when different
  domains are hosted with the same provider. One solution is to use
  check_recipient_mx_access plus "filter" to group these deliveries.
  Other solutions require architectural changes to Postfix (looking
  up next-hop MX information before scheduling deliveries)

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: possible bottlenecks

@lbutlr
In reply to this post by Zsombor B
On 13 Oct 2020, at 22:47, Zsombor B <[hidden email]> wrote:
> I know this is a complicated question but what/where do you see possible bottlenecks in postfix?
> Is it CPU? RAM? Disk IO?

In theory? Sure, any of those could be a bottle neck. On actuality, the bottles necks are processing spam if you receive mail and not appearing to be a spammer.

> I'm building an infra to send out ~3-5 million emails a day.

If you pop onto the Internet all of a sudden sending 5 million emails a day you better be sure that your DKIM SPF DMARC and DNS are perfect and that your IP address has never been associated with a spammer. Because if there is one thing that will cripple your mail server it is having mail sit in queue because it's been throttled. The big email hosts do this a lot (especially Outlook.com and yahoo.com). And if you get on their (automated) bad side, you are well and thoroughly screwed. If you messages LOOK spammy enough that users will mark them as spam, then you will, again, be completely hosed whether the email is spam or not.

Other than that, I think a raspberry pi 4 with a USB SSD might be able to mange 5 million emails a day.

--
'What is this thing, anyway?' said the Dean, inspecting the implement
        in his hands. 'It's called a shovel', said the Senior Wrangler.
        'I've seen the gardeners use them. You stick the sharp end in the
        ground. Then it gets a bit technical.' --Reaper Man

Reply | Threaded
Open this post in threaded view
|

Re: possible bottlenecks

Demi M. Obenour
On 10/16/20 8:57 AM, @lbutlr wrote:

> On 13 Oct 2020, at 22:47, Zsombor B <[hidden email]> wrote:
>> I know this is a complicated question but what/where do you see possible bottlenecks in postfix?
>> Is it CPU? RAM? Disk IO?
>
> In theory? Sure, any of those could be a bottle neck. On actuality, the bottles necks are processing spam if you receive mail and not appearing to be a spammer.
>
>> I'm building an infra to send out ~3-5 million emails a day.
>
> If you pop onto the Internet all of a sudden sending 5 million emails a day you better be sure that your DKIM SPF DMARC and DNS are perfect and that your IP address has never been associated with a spammer. Because if there is one thing that will cripple your mail server it is having mail sit in queue because it's been throttled. The big email hosts do this a lot (especially Outlook.com and yahoo.com). And if you get on their (automated) bad side, you are well and thoroughly screwed. If you messages LOOK spammy enough that users will mark them as spam, then you will, again, be completely hosed whether the email is spam or not.
>
> Other than that, I think a raspberry pi 4 with a USB SSD might be able to mange 5 million emails a day.
I don’t recommend stock OpenSMTPD for security reasons, although I
have some patches that make it much better in this regard.  However,
all of those relate to local deliveries.  If you can afford to disable
local deliveries, OpenSMTPD is actually a good choice for this work.
It can handle multi-million-message queues without any problems.

That said, you will run into numerous other problems.
https://www.mail-archive.com/misc@.../msg05153.html is a
good introduction to them.  Gilles Chehade (the author of that post,
and formerly one of the two main developers of OpenSMTPD) is an expert
on the subject, and I trust his recommendation.

Sincerely,

Demi

OpenPGP_0xB288B55FFF9C22C1.asc (3K) Download Attachment
OpenPGP_signature (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: possible bottlenecks

Viktor Dukhovni
> On Oct 16, 2020, at 3:14 PM, Demi M. Obenour <[hidden email]> wrote:
>
> I don’t recommend stock OpenSMTPD for security reasons, although I
> have some patches that make it much better in this regard.  However,
> all of those relate to local deliveries.  If you can afford to disable
> local deliveries, OpenSMTPD is actually a good choice for this work.
> It can handle multi-million-message queues without any problems.

Well, for good performance one should not have much of a queue at all,
the mail should go out as quickly as it comes in.  If you're queueing
a lot of email, then your output is not keeping up with the input.

Unless there's a particularly good reason why you believe that OpenSMTPD
would do better than Postfix in bulk mail delivery performance, it is not
helpful to recommend it here.

--
        Viktor.

Reply | Threaded
Open this post in threaded view
|

Re: possible bottlenecks

Demi M. Obenour
On 10/16/20 2:10 PM, Viktor Dukhovni wrote:

>> On Oct 16, 2020, at 3:14 PM, Demi M. Obenour <[hidden email]> wrote:
>>
>> I don’t recommend stock OpenSMTPD for security reasons, although I
>> have some patches that make it much better in this regard.  However,
>> all of those relate to local deliveries.  If you can afford to disable
>> local deliveries, OpenSMTPD is actually a good choice for this work.
>> It can handle multi-million-message queues without any problems.
>
> Well, for good performance one should not have much of a queue at all,
> the mail should go out as quickly as it comes in.  If you're queueing
> a lot of email, then your output is not keeping up with the input.
Not necessarily.  It is quite possible for the peak rate of incoming
traffic to be greater than the average rate at which it can be
delivered, even though the average rate of incoming mail is lower
than the average delivery rate.  That will result in queues forming
and eventually draining.  If the bursty traffic is distributed among
multiple recipients with their own, separate rate limits, it is quite
possible to have a queue of finite size at quasi-steady-state.

> Unless there's a particularly good reason why you believe that OpenSMTPD
> would do better than Postfix in bulk mail delivery performance, it is not
> helpful to recommend it here.

I misunderstood your previous message, sorry.  I interpreted it as a
statement that Postfix struggles with very large mail queues, and I
know OpenSMTPD does not.

Apologies,

Demi M. Obenour

OpenPGP_0xB288B55FFF9C22C1.asc (3K) Download Attachment
OpenPGP_signature (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: possible bottlenecks

Viktor Dukhovni
On Fri, Oct 16, 2020 at 02:37:04PM -0400, Demi M. Obenour wrote:

> > Unless there's a particularly good reason why you believe that OpenSMTPD
> > would do better than Postfix in bulk mail delivery performance, it is not
> > helpful to recommend it here.
>
> I misunderstood your previous message, sorry.  I interpreted it as a
> statement that Postfix struggles with very large mail queues, and I
> know OpenSMTPD does not.

There is no MTA which is able to retry an arbitrarily large queue in the
face of remote tempfailures without the retry times stretching out to
unnacceptably large values.  This is a mathematical fact.  Given that
the output rate is bounded, the time taken to process a large backlog
grows with the size of the backlog.

The practical limit to the deferred queue size is therefore ~2 days of
throughput, and depends heavily on the per-delivery latency.  If
delivery failures are slow (tarpitting or otherwise slow destinations)
the impact is greater.

There is no magic that can make OpenSMTPD immune to the laws of
arithmetic.

Otherwise, with appriate choices of hash_queue_depth and
hash_queue_names, Postfix handles backlogs in the low millions
of messages, because anything much higher simply will not get
processed quickly enough in a reasonable time.

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: possible bottlenecks

Demi M. Obenour
On 10/16/20 9:24 PM, Viktor Dukhovni wrote:

> The practical limit to the deferred queue size is therefore ~2 days of
> throughput, and depends heavily on the per-delivery latency.  If
> delivery failures are slow (tarpitting or otherwise slow destinations)
> the impact is greater.

Can the latency problems be worked around by increasing concurrency?
My understanding is that Postfix might have problems at very high
concurrency due to using one process per connection, whereas some
other servers are event-driven and can handle thousands of connections
without using too much memory.

> There is no magic that can make OpenSMTPD immune to the laws of
> arithmetic.

Indeed there is not.  Any statement to that effect on my part was
erroneous and based on a misunderstanding.

Sincerely,

Demi

OpenPGP_0xB288B55FFF9C22C1.asc (3K) Download Attachment
OpenPGP_signature (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: possible bottlenecks

Viktor Dukhovni
> On Oct 17, 2020, at 3:09 AM, Demi M. Obenour <[hidden email]> wrote:
>
>> The practical limit to the deferred queue size is therefore ~2 days of
>> throughput, and depends heavily on the per-delivery latency.  If
>> delivery failures are slow (tarpitting or otherwise slow destinations)
>> the impact is greater.
>
> Can the latency problems be worked around by increasing concurrency?

Yes, so unsurprisingly, Postfix amortises latency via carefully managed
concurrency.

> My understanding is that Postfix might have problems at very high
> concurrency due to using one process per connection, whereas some
> other servers are event-driven and can handle thousands of connections
> without using too much memory.

Postfix reuses processes for multiple deliveries, so process creation
is effectively amortised.  SMTP delivery being a rather expensive operation
(DNS lookups, connection setup, TLS handshakes, ...) the fractional (becase
shared across multiple deliveries) cost of process creation is dwarfed by
the actual SMTP transaction costs.

On modern hardware (anything built in the last ~2 decades), you can run
thousands of concurrent SMTP delivery agents, without any difficulty,
their executables are loaded only once, and per connection memory utilisation
is modest.  You run out of remote sites' willingness to receive your email
long before you run out of local capacity to send it.

The event driven design mostly just makes those other servers more complex,
and more prone to security bugs.  Postfix 3.4 and later grudgingly do some
event-driven work because TLS connection reuse with OpenSSL is not possible
out-of-process.  So the tlsproxy(8) process context switches between multiple
TLS connections, but the rest of the SMTP delivery agent is one connection
per process and performs just fine.  The architecture is however more robust
and secure.

Postfix is not an HTTP server handling tens to hundreds of thousands of requests
per second, and does not benefit from the optimisations needed for those kinds
of workloads.  Premature optimisations that sacrifice robustness and security
for little gain are not part of the design.

--
        Viktor.

Reply | Threaded
Open this post in threaded view
|

Re: possible bottlenecks

Demi M. Obenour
On 10/17/20 1:23 AM, Viktor Dukhovni wrote:

>> On Oct 17, 2020, at 3:09 AM, Demi M. Obenour <[hidden email]> wrote:
>>
>>> The practical limit to the deferred queue size is therefore ~2 days of
>>> throughput, and depends heavily on the per-delivery latency.  If
>>> delivery failures are slow (tarpitting or otherwise slow destinations)
>>> the impact is greater.
>>
>> Can the latency problems be worked around by increasing concurrency?
>
> Yes, so unsurprisingly, Postfix amortises latency via carefully managed
> concurrency.
>
>> My understanding is that Postfix might have problems at very high
>> concurrency due to using one process per connection, whereas some
>> other servers are event-driven and can handle thousands of connections
>> without using too much memory.
>
> Postfix reuses processes for multiple deliveries, so process creation
> is effectively amortised.  SMTP delivery being a rather expensive operation
> (DNS lookups, connection setup, TLS handshakes, ...) the fractional (becase
> shared across multiple deliveries) cost of process creation is dwarfed by
> the actual SMTP transaction costs.
>
> On modern hardware (anything built in the last ~2 decades), you can run
> thousands of concurrent SMTP delivery agents, without any difficulty,
> their executables are loaded only once, and per connection memory utilisation
> is modest.  You run out of remote sites' willingness to receive your email
> long before you run out of local capacity to send it.
>
> The event driven design mostly just makes those other servers more complex,
> and more prone to security bugs.  Postfix 3.4 and later grudgingly do some
> event-driven work because TLS connection reuse with OpenSSL is not possible
> out-of-process.  So the tlsproxy(8) process context switches between multiple
> TLS connections, but the rest of the SMTP delivery agent is one connection
> per process and performs just fine.  The architecture is however more robust
> and secure.
Good point.  I have wondered if something like s2n would be a better
choice, although I would probably use the OpenBSD Postfix packages
built against LibreSSL.

> Postfix is not an HTTP server handling tens to hundreds of thousands of requests
> per second, and does not benefit from the optimisations needed for those kinds
> of workloads.  Premature optimisations that sacrifice robustness and security
> for little gain are not part of the design.

I had not considered that, but you are correct.  Email really is a much
lower volume service than HTTP.  If Postfix needed to handle hundreds
of thousands of requests per second and concurrent connections, the
process-per-connection model would obviously not work.  But it does
not need to handle that much load, so the simpler approach is better.
Since Postfix is written in C, that simpler approach is to use one
process per connection.

If one is Google or Microsoft and need to process hundreds of millions
of messages per day, then Postfix might not work.  But if one needs
to handle that much mail, then one can probably afford to write a
bespoke MTA.

Sincerely,

Demi

OpenPGP_0xB288B55FFF9C22C1.asc (3K) Download Attachment
OpenPGP_signature (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: possible bottlenecks

Viktor Dukhovni
On Sat, Oct 17, 2020 at 02:05:57PM -0400, Demi M. Obenour wrote:

> > Postfix 3.4 and later grudgingly do some event-driven work because
> > TLS connection reuse with OpenSSL is not possible out-of-process.
> > So the tlsproxy(8) process context switches between multiple TLS
> > connections, but the rest of the SMTP delivery agent is one
> > connection per process and performs just fine.  The architecture is
> > however more robust and secure.
>
> Good point.  I have wondered if something like s2n would be a better
> choice, although I would probably use the OpenBSD Postfix packages
> built against LibreSSL.

Postfix does not support LibreSSL, and LibreSSL does not make it
possible to move SSL connections between processes, it is just a stale
fork of OpenSSL.  There's no advantage in using LibreSSL and Postfix
depends on features of the real OpenSSL.

> If one is Google or Microsoft and need to process hundreds of millions
> of messages per day, then Postfix might not work.  But if one needs
> to handle that much mail, then one can probably afford to write a
> bespoke MTA.

IIRC Hotmail originally ran on Postfix, sendming and receiving email
scales horizontally, just field more hardware as needed.  But that was
some time ago, and by now I am sure that they did replace it with
something built inhouse.  The thing that really needs custom scaling is
the IMAP and webmail frontends, they surely have interesting storage
management designs to support O(1 billion) users with O(1Gb) of email
each.

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: possible bottlenecks

Peter Blair
In reply to this post by Demi M. Obenour
At 17 October, 2020 Demi M. Obenour wrote:
 
> > Postfix is not an HTTP server handling tens to hundreds of thousands of requests
> > per second, and does not benefit from the optimisations needed for those kinds
> > of workloads.  Premature optimisations that sacrifice robustness and security
> > for little gain are not part of the design.
>
> If one is Google or Microsoft and need to process hundreds of millions
> of messages per day, then Postfix might not work.  But if one needs
> to handle that much mail, then one can probably afford to write a
> bespoke MTA.

A decade ago I helped create and run a mailbox hoster with a few million
active accounts.  We were nothing compared to gmail/hotmail, but we ran
our border MTAs using postfix (with custom smtp content filters and
custom LMTP services).  My memory is rusty, but given the amount of spam
we consumed, we definitely were doing 10s-100s of millions of messages
per day (on the inbound side).  Postfix did great -- our choke point was
storage IOops being saturated by spam that no one would ultimately read,
which is annoying but the truth of life.

I no longer work in email, but I do work at a fairly large $MEGACORP and
I was discussing something the other day with a coworker: When you're
sitting on the internet with a service that needs to suport downtime,
heavy load, etc., then having a service that fully supports RFCs is
really important because you can't be taking postmaster@ emails from
rando operators because you're doing something dumb.

But once you're dealing with internal services, it's all custom code,
because you can just message the engineer responsible for whichever
subservice is acting up and sort it out asap.  As such things tend to be
much more narrow focused in implementation and written for narrowly
scoped perf metrics in mind and are less robust (feature wise) than
software like postfix.