Ironic interaction of greylistiing, backup MX hosts and DANE

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Ironic interaction of greylistiing, backup MX hosts and DANE

Viktor Dukhovni
[ To be sent separately also to the [hidden email] list. ]

I sent a "please fix your TLSA records" notice to "postmaster" and
"info" at a domain whose primary MX host certificate fails to match
its TLSA records:

    postfix/pickup[62805]: 7672C1DD39: ...
    postfix/cleanup[63835]: 7672C1DD39: message-id=<...>
    postfix/qmgr[844]: 7672C1DD39: from=<...>, size=8815, nrcpt=2 (queue active)
    postfix/smtp[63837]: 7672C1DD39: host mail.example.nl[192.0.2.1]
        said: 450 4.2.0 ... Greylisted, ...
    postfix/smtp[63837]: 7672C1DD39: to=<[hidden email]>,
        relay=mail.example.nl[192.0.2.1]:25, delay=3.7, delays=0.03/0.01/1.5/2.2,
        dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 119F07F861)
    postfix/smtp[63837]: 7672C1DD39: to=<[hidden email]>,
        relay=vps.example.nl[192.0.2.2]:25, delay=5.7, delays=0.03/0.01/5/0.61,
        dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 2FBDC2064A)

The  "postmaster" account copy was accepted by the primary MX, but
the "info" address was greylisted: *only* by the primary MX.

However, the secondary MX accepts email *without* greylisting!
This has the effect of delivering all mail via the secondary, with
the primary never seeing any retries that cause greylisting to
pass.

What happened next was interesting and ironic.  I got a "delay"
notification from the secondary:

    Your message could not be delivered for more than 3 hour(s).
    It will be retried until it is 5 day(s) old.

    For further assistance, please send mail to postmaster.

    If you do so, please include this problem report. You can
    delete your own text from the attached returned message.

                       The mail system

    <[hidden email]>: Server certificate not trusted

The secondary is configured to verify DANE TLSA when relaying mail
to the primary, so the problem report to "info", and indeed pretty
much all mail to the system in question is queueing on the secondary
MX host, waiting for the primary MX TLSA records to be fixed!

Only "postmaster" email gets through to its destination (if the
sending system does not validate TLSA records, which is naturally
the case for my outbound "please fix your TLSA records" notices).

The main lesson here is not implement Greylisting on only a subset
of your MX hosts.  Don't do that!  

    1. When greylisting, make sure that all MX hosts with equal
       or worse (higher) MX preference also greylist.

    2. It is harmless (though less effective) to greylist only on
       (all) backup MX hosts, and skip greylisting on the primary.
       It is not a good idea to greylist on the primary, and not
       on the backups.

    3. Monitor your DANE TLSA records, in such a way that notices
       of problems get through even when the TLSA records are
       stale.

    4. Handle certificate rotation correctly.

        https://dane.sys4.de/common_mistakes#3

        http://tools.ietf.org/html/rfc7671#section-8.1
        http://tools.ietf.org/html/rfc7671#section-8.4
        https://community.letsencrypt.org/t/please-avoid-3-0-1-and-3-0-2-dane-tlsa-records-with-le-certificates/7022
        https://www.internetsociety.org/deploy360/blog/2016/03/lets-encrypt-certificates-for-mail-servers-and-dane-part-2-of-2/

        With "3 1 1" + "2 1 1" TLSA records, the rollover process
        can be substantially simplified:

            https://www.ietf.org/mail-archive/web/uta/current/msg01498.html

Happy greylisting and TLSA record publishing to you all...

--
        Viktor.