Outgoing DANE not working

classic Classic list List threaded Threaded
60 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Outgoing DANE not working

Christian
Hi there,

I tried the DANE Test on "havedane.net" and figured, that outgoing DANE
is not working.  I get the following:

Email to non-DANE domain delivered.
Email to DANE domain delivered.
Email to domain with invalid DANE delivered.

So apparently the check for the last one is failing (at least).
Checking the logs, the first two are "failing" as well, as DANE is not
tested and all connections are "Untrusted" (cause of self-sig cert).

However TLS is regularly working, I checked with other DANE enabled
domains and I get a "Trusted" connection, but not "Verified".

Testing a lot, I found, that apparently postfix is not checking the
TSLA record, I think by not recognising the domain as DNSSEC enabled?

I am not sure what to do anymore. If anyone has had a similar problem,
any help would be appreciated.


More details on what I did:
I am running in a docker setup (alpine based on debian host) with my own unbound DNS resolver.
I started to check if I have problems in my DNSSEC checks. running a
"dig com. SOA +dnssec" from my postfix container, I get

##########
; <<>> DiG 9.14.8 <<>> com. SOA +dnssec
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 18198
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
;; QUESTION SECTION:
;com. IN SOA

;; ANSWER SECTION:
com. 900 IN SOA a.gtld-servers.net.
nstld.verisign-grs.com. 1586697402 1800 900 604800 86400
com. 900 IN RRSIG SOA 8 1 900
20200419131642 20200412120642 56311 com.
km8/J8z8l6NNsoU0Ag5PfaPAN6sLYxzIYOm1qzdAfu7a/IxlsRnWqPgh
VsfO6+MDxHpUZ9VI9O3tc9EvpJ9p7LKLKoV1BtfIdKIXXeE7viow5LG8
FlzF04w4Qd5hd2oLY1F4bvdDQmB7AAPNRC/3mCySNZTqg/iyXbH5ePOk
rQ+ue9ThApZOGHTbL9jyFnFsDCoUu3OhVWxA2BQv8zVEZQ==

;; Query time: 14 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Sun Apr 12 15:17:00 CEST 2020
;; MSG SIZE  rcvd: 300
##########

Having the ad flag, this seems to be ok for DNSSEC.

Next I forced postfix to see "havedane.net" as a "dane-only" domain via
tls policies.
That lead to the following errors:

##########
Apr 11 19:14:39 server docker/postfix/smtp[904]: warning: TLS policy
lookup for do.havedane.net/do.havedane.net: non DNSSEC destination
Apr 11 19:14:39 server docker/postfix/smtp[904]: warning: TLS policy
lookup for do.havedane.net/do.havedane.net: non DNSSEC destination
##########

Hence confirming my theory, that DNSSEC is not properly checked.
Next thing I did is monitoring the DNS queries in unbound and found,
that onyl MX, A and AAAA is requested:

##########
Apr 12 14:00:56 server docker/unbound[567]: [1586692856] unbound[1:0]
info: 192.168.4.5 do.havedane.net. MX IN#015
Apr 12 14:00:56 server docker/unbound[567]: [1586692856] unbound[1:0]
info: 192.168.4.5 do.havedane.net. A IN#015
Apr 12 14:00:56 server docker/unbound[567]: [1586692856] unbound[1:0]
info: 192.168.4.5 do.havedane.net. AAAA IN#015
##########

A check of a TLSA record would look like this in unbound (triggered
with dig), but this is missing with the postfix triggered queries
(hence, how should postfix know certificate information)

##########
Apr 12 14:01:25 server docker/unbound[567]: [1586692885] unbound[1:0]
info: 192.168.4.5 _25._tcp.do.havedane.net. TLSA IN#015
##########

I read in the documentation, that apparently postfix checks with
certain FLAGS (RES_USE_DNSSEC and RES_USE_EDNS0) in the MX request for
DNSSEC validity, however I do not know how to debug if that is
happening. Hence I am stuck now.
Anyone knows what to do?

postconf -n (domain replaced by XXX)

##########
append_dot_mydomain = no
biff = no
bounce_queue_lifetime = 1h
compatibility_level = 2
debug_peer_list = havedane.net,127.0.0.1,127.0.0.11,192.168.4.254
inet_interfaces = all
inet_protocols = all
mailbox_size_limit = 0
maximal_backoff_time = 15m
maximal_queue_lifetime = 1h
message_size_limit = 52428800
milter_default_action = accept
milter_mail_macros = i {mail_addr} {client_addr} {client_name}
{auth_authen}
milter_protocol = 6
minimal_backoff_time = 5m
mua_client_restrictions =
permit_mynetworks,permit_sasl_authenticated,reject
mua_relay_restrictions =
reject_non_fqdn_recipient,reject_unknown_recipient_domain,permit_mynetw
orks,permit_sasl_authenticated,reject
mua_sender_restrictions =
permit_mynetworks,reject_non_fqdn_sender,reject_sender_login_mismatch,p
ermit_sasl_authenticated,reject
myhostname = server.XXX.de
mynetworks = 127.0.0.0/8 192.168.4.0/24 [::1]/128
[fd00::192:168:4:0]/112
non_smtpd_milters = inet:rspamd:11332
postscreen_access_list = permit_mynetworks
cidr:/etc/postfix/postscreen_access
postscreen_blacklist_action = drop
postscreen_dnsbl_action = drop
postscreen_dnsbl_sites = dnsbl.sorbs.net*1, bl.spamcop.net*1,
ix.dnsbl.manitu.net*2, zen.spamhaus.org*2
postscreen_dnsbl_threshold = 2
postscreen_greet_action = drop
queue_run_delay = 5m
recipient_delimiter = +
smtp_dns_support_level = dnssec
smtp_host_lookup = dns
smtp_tls_CAfile = /etc/ssl/certs/ca-certificates.crt
smtp_tls_ciphers = high
smtp_tls_loglevel = 1
smtp_tls_mandatory_ciphers = high
smtp_tls_mandatory_protocols = !SSLv2, !SSLv3
smtp_tls_policy_maps = hash:/etc/postfix/maps/tls-policy
smtp_tls_protocols = !SSLv2, !SSLv3
smtp_tls_security_level = dane
smtp_tls_session_cache_database = btree:${data_directory}/smtp_scache
smtpd_client_restrictions = permit_mynetworks check_client_access
hash:/etc/postfix/maps/without_ptr reject_unknown_client_hostname
smtpd_data_restrictions = reject_unauth_pipelining
smtpd_helo_required = yes
smtpd_helo_restrictions = permit_mynetworks
reject_invalid_helo_hostname reject_non_fqdn_helo_hostname
reject_unknown_helo_hostname
smtpd_milters = inet:rspamd:11332
smtpd_recipient_restrictions = check_recipient_access
hash:/etc/postfix/maps/recipient-access
smtpd_relay_restrictions = reject_non_fqdn_recipient
reject_unknown_recipient_domain permit_mynetworks
reject_unauth_destination
smtpd_tls_cert_file = /etc/ssl/private/server_XXX_de_chained.pem
smtpd_tls_ciphers = high
smtpd_tls_dh1024_param_file = /etc/ssl/private/dh-4096.pem
smtpd_tls_eccert_file = /etc/ssl/private/ecc-server_XXX_de_chained.pem
smtpd_tls_eckey_file = /etc/ssl/private/ecc-XXX_de.key
smtpd_tls_eecdh_grade = ultra
smtpd_tls_exclude_ciphers = kEDH
smtpd_tls_key_file = /etc/ssl/private/XXX_de.key
smtpd_tls_loglevel = 1
smtpd_tls_protocols = !SSLv2, !SSLv3
smtpd_tls_received_header = yes
smtpd_tls_security_level = may
smtpd_tls_session_cache_database = btree:${data_directory}/smtpd_scache
syslog_name =
docker/${multi_instance_name?{$multi_instance_name}:{postfix}}
tls_high_cipherlist =
EDH+CAMELLIA:EDH+aRSA:EECDH+aRSA+AESGCM:EECDH+aRSA+SHA256:EECDH:+CAMELL
IA128:+AES128:+SSLv3:!aNULL:!eNULL:!LOW:!3DES:!MD5:!EXP:!PSK:!DSS:!RC4:
!SEED:!IDEA:!ECDSA:kEDH:CAMELLIA128-SHA:AES128-SHA
tls_preempt_cipherlist = yes
tls_ssl_options = NO_COMPRESSION,NO_RENEGOTIATION
virtual_alias_maps = hash:/etc/postfix/maps/aliases
virtual_mailbox_domains = XXX.de
virtual_transport = lmtp:inet:dovecot:24
##########

postconf -Mf

##########
smtp       inet  n       -       n       -       1       postscreen
    -o smtpd_sasl_auth_enable=no
smtpd      pass  -       -       n       -       -       smtpd
dnsblog    unix  -       -       n       -       0       dnsblog
tlsproxy   unix  -       -       n       -       0       tlsproxy
submission inet  n       -       n       -       -       smtpd
    -o syslog_name=postfix/submission
    -o smtpd_tls_security_level=encrypt
    -o smtpd_sasl_auth_enable=yes
    -o smtpd_sasl_type=dovecot
    -o smtpd_sasl_path=inet:dovecot:10001
    -o smtpd_sasl_security_options=noanonymous
    -o smtpd_relay_restrictions=$mua_relay_restrictions
    -o smtpd_sender_login_maps=hash:/etc/postfix/maps/sender-login
    -o smtpd_sender_restrictions=$mua_sender_restrictions
    -o smtpd_client_restrictions=$mua_client_restrictions
    -o smtpd_helo_required=no
    -o smtpd_helo_restrictions=
    -o milter_macro_daemon_name=ORIGINATING
    -o cleanup_service_name=submission-header-cleanup
pickup     unix  n       -       n       60      1       pickup
cleanup    unix  n       -       n       -       0       cleanup
qmgr       unix  n       -       n       300     1       qmgr
tlsmgr     unix  -       -       n       1000?   1       tlsmgr
rewrite    unix  -       -       n       -       -       trivial-
rewrite
bounce     unix  -       -       n       -       0       bounce
defer      unix  -       -       n       -       0       bounce
trace      unix  -       -       n       -       0       bounce
verify     unix  -       -       n       -       1       verify
flush      unix  n       -       n       1000?   0       flush
proxymap   unix  -       -       n       -       -       proxymap
proxywrite unix  -       -       n       -       1       proxymap
smtp       unix  -       -       n       -       -       smtp
relay      unix  -       -       n       -       -       smtp
showq      unix  n       -       n       -       -       showq
error      unix  -       -       n       -       -       error
retry      unix  -       -       n       -       -       error
discard    unix  -       -       n       -       -       discard
local      unix  -       n       n       -       -       local
virtual    unix  -       n       n       -       -       virtual
lmtp       unix  -       -       n       -       -       lmtp
anvil      unix  -       -       n       -       1       anvil
scache     unix  -       -       n       -       1       scache
submission-header-cleanup unix n - n     -       0       cleanup
    -o header_checks=regexp:/etc/postfix/maps/submission_header_cleanup
#########

Reply | Threaded
Open this post in threaded view
|

Re: Outgoing DANE not working

Viktor Dukhovni
On Sun, Apr 12, 2020 at 04:20:48PM +0200, Christian wrote:

> I am running in a docker setup (alpine based on debian host) with my own unbound DNS resolver.

What is the content of /etc/resolv.conf?

> I started to check if I have problems in my DNSSEC checks. running a
> "dig com. SOA +dnssec" from my postfix container, I get
> ; <<>> DiG 9.14.8 <<>> com. SOA +dnssec
> ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
> ;; SERVER: 127.0.0.11#53(127.0.0.11)

Ok, the server at 127.0.0.11 appears to support DNSSEC validation.
Was the test in fact run in the same container as Postfix?

> Apr 11 19:14:39 server docker/postfix/smtp[904]: warning: TLS policy
>   lookup for do.havedane.net/do.havedane.net: non DNSSEC destination
> Apr 11 19:14:39 server docker/postfix/smtp[904]: warning: TLS policy
>   lookup for do.havedane.net/do.havedane.net: non DNSSEC destination

But Postfix does not see the AD bit, so the most plausible explanation
is that it is not using the same resolver.  What are the file
permissions on /etc/resolv.conf?

> Hence confirming my theory, that DNSSEC is not properly checked.
> Next thing I did is monitoring the DNS queries in unbound and found,
> that onyl MX, A and AAAA is requested:
>
> ##########
> Apr 12 14:00:56 server docker/unbound[567]: [1586692856] unbound[1:0]
> info: 192.168.4.5 do.havedane.net. MX IN#015
> Apr 12 14:00:56 server docker/unbound[567]: [1586692856] unbound[1:0]
> info: 192.168.4.5 do.havedane.net. A IN#015
> Apr 12 14:00:56 server docker/unbound[567]: [1586692856] unbound[1:0]
> info: 192.168.4.5 do.havedane.net. AAAA IN#015

Yes, as expected when the A records come back with the AD bit not set.

> I read in the documentation, that apparently postfix checks with
> certain FLAGS (RES_USE_DNSSEC and RES_USE_EDNS0) in the MX request for
> DNSSEC validity, however I do not know how to debug if that is
> happening. Hence I am stuck now.

You can capture the DNS traffic into PCAP file, then "Ctrl-C" when
you've sent all the packets of interest.  Replace "lo" with the
appropriate interface name if not "lo".  I used "-s512" since all the
data of interest (qname, flags, ...) easily fits into the first 512
bytes.

    # tcpdump -s512 -w /tmp/dns.pcap -i lo udp port 53
    ^C

The PCAP file can be analyzed with "tshark -nr /tmp/dns.pcap -V"

You can run "posttls-finger -c -Lsummary do.havedane.net" (if installed)
to generate the DNS traffic of interest.

    $ posttls-finger -d sha256 -c -Lsummary,certmatch do.havedane.net
    posttls-finger: using DANE RR: _25._tcp.do.havedane.net IN TLSA 2 1 1 27:B6:94:B5:1D:1F:EF:88:85:37:2A:CF:B3:91:93:75:97:22:B7:36:B0:42:68:64:DC:1C:79:D0:65:1F:EF:73
    posttls-finger: using DANE RR: _25._tcp.do.havedane.net IN TLSA 3 1 1 55:3A:CF:88:F9:EE:18:CC:AA:E6:35:CA:54:0F:32:CB:84:AC:A7:7C:47:91:66:82:BC:B5:42:D5:1D:AA:87:1F
    posttls-finger: do.havedane.net[2001:1af8:4700:a118:90::7c0]:25: depth=1 matched trust anchor public-key sha256 digest=27:B6:94:B5:1D:1F:EF:88:85:37:2A:CF:B3:91:93:75:97:22:B7:36:B0:42:68:64:DC:1C:79:D0:65:1F:EF:73
    posttls-finger: do.havedane.net[2001:1af8:4700:a118:90::7c0]:25: depth=0 chain is trust-anchor signed
    posttls-finger: do.havedane.net[2001:1af8:4700:a118:90::7c0]:25: depth=0 matched end entity public-key sha256 digest=55:3A:CF:88:F9:EE:18:CC:AA:E6:35:CA:54:0F:32:CB:84:AC:A7:7C:47:91:66:82:BC:B5:42:D5:1D:AA:87:1F
    posttls-finger: do.havedane.net[2001:1af8:4700:a118:90::7c0]:25: Matched subjectAltName: do.havedane.net
    posttls-finger: do.havedane.net[2001:1af8:4700:a118:90::7c0]:25 CommonName do.havedane.net
    posttls-finger: do.havedane.net[2001:1af8:4700:a118:90::7c0]:25: subject_CN=do.havedane.net, issuer_CN=Fort-Funston CA, fingerprint=AF:BE:72:A5:16:C2:48:65:6D:20:0E:B0:F3:C8:DF:A2:F1:FC:5B:C6:E3:D8:85:48:30:C6:E2:DA:40:28:C9:CC, pkey_fingerprint=55:3A:CF:88:F9:EE:18:CC:AA:E6:35:CA:54:0F:32:CB:84:AC:A7:7C:47:91:66:82:BC:B5:42:D5:1D:AA:87:1F
    posttls-finger: Verified TLS connection established to do.havedane.net[2001:1af8:4700:a118:90::7c0]:25: TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)

> postconf -n (domain replaced by XXX)

Are these the docker settings?

> debug_peer_list = havedane.net,127.0.0.1,127.0.0.11,192.168.4.254

There's no reason to include the nameserver in debug_peer_list.

> smtp_tls_security_level = dane
> smtp_dns_support_level = dnssec
> smtp_host_lookup = dns

Good, these are needed, though the last one is a default, and
can be (is best) left out.

> tls_high_cipherlist =
> !aNULL:!eNULL:!LOW:!3DES:!MD5:!EXP:!PSK:!DSS:!RC4:
> !SEED:!IDEA:!ECDSA:kEDH:CAMELLIA128-SHA:AES128-SHA

This is much too explicit.  You're cargo-culting bad advice from some
ill-considered "HOW-TO".  The Postfix defaults are better, any (minimal,
previously recommended on this list) exclusions you want to add are best
done with:

    smtp_tls_exclude_ciphers =
    smtp_tls_mandatory_exclude_ciphers =
    smtpd_tls_exclude_ciphers =
    smtpd_tls_mandatory_exclude_ciphers =

Excluding "ECDSA" is most unwise, especially if you're planning to
support DANE.

> smtp       unix  -       -       n       -       -       smtp
> relay      unix  -       -       n       -       -       smtp

Are these the docker settings?

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: Outgoing DANE not working

Christian
Hi Viktor,
thanks for the response! Apparently the mail was too long (>4000) and
got rejected, hence I put it to pastebin: https://pastebin.com/1e3sR0Hq

I think the tcpdumps are interesting, as they show that postfix is not
requesting with the right flags (If I am not reading everything wrong).

Kind Regards
  Christian

Reply | Threaded
Open this post in threaded view
|

Re: Outgoing DANE not working

Viktor Dukhovni
On Mon, Apr 13, 2020 at 02:12:49AM +0200, Christian wrote:

> thanks for the response! Apparently the mail was too long (>4000) and
> got rejected, hence I put it to pastebin: https://pastebin.com/1e3sR0Hq

The query in your PCAP file was not sent to 127.0.0.11, and had no EDNS
OPT record (so no "DO" bit):

    Internet Protocol Version 4, Src: 192.168.4.5, Dst: 192.168.4.254
    User Datagram Protocol, Src Port: 34651, Dst Port: 53
    Domain Name System (query)
        Transaction ID: 0x55b7
        Flags: 0x0100 Standard query
            0... .... .... .... = Response: Message is a query
            .000 0... .... .... = Opcode: Standard query (0)
            .... ..0. .... .... = Truncated: Message is not truncated
            .... ...1 .... .... = Recursion desired: Do query recursively
            .... .... .0.. .... = Z: reserved (0)
            .... .... ...0 .... = Non-authenticated data: Unacceptable
        Questions: 1
        Answer RRs: 0
        Authority RRs: 0
        Additional RRs: 0
        Queries
            do.havedane.net: type MX, class IN
                Name: do.havedane.net
                [Name Length: 15]
                [Label Count: 3]
                Type: MX (Mail eXchange) (15)
                Class: IN (0x0001)

Is 127.0.0.11 inside the container == 192.168.4.254 outside?

What C-library and operating system is this?  Perhaps the C-library in
Docker ignores RES_USE_EDNS0 and RES_USE_DNSSEC or more generally
changes to _res.options?

> I think the tcpdumps are interesting, as they show that postfix is not
> requesting with the right flags (If I am not reading everything wrong).

When Postfix is configured with "smtp_dns_support_level = dnssec", the
RES_USE_DNSSEC and RES_USE_EDNS0 flags are set around calls to the
resolver routines.  If your C-library (perhaps only inside docker) has
an incopatible resolver API, then you'll need a more compatible resolver
library and/or a different container technology.

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: Outgoing DANE not working

Christian
Hello Viktor,
thanks again, please see my answers inline.

Am Sonntag, den 12.04.2020, 22:47 -0400 schrieb Viktor Dukhovni:

> On Mon, Apr 13, 2020 at 02:12:49AM +0200, Christian wrote:
>
>
> thanks for the response! Apparently the mail was too long (>4000) and
> got rejected, hence I put it to pastebin:
> https://pastebin.com/1e3sR0Hq
>
> The query in your PCAP file was not sent to 127.0.0.11, and had no
> EDNS
> OPT record (so no "DO" bit):
>
>     Internet Protocol Version 4, Src: 192.168.4.5, Dst: 192.168.4.254
>     User Datagram Protocol, Src Port: 34651, Dst Port: 53
>     Domain Name System (query)
>         Transaction ID: 0x55b7
>         Flags: 0x0100 Standard query
>             0... .... .... .... = Response: Message is a query
>             .000 0... .... .... = Opcode: Standard query (0)
>             .... ..0. .... .... = Truncated: Message is not truncated
>             .... ...1 .... .... = Recursion desired: Do query
> recursively
>             .... .... .0.. .... = Z: reserved (0)
>             .... .... ...0 .... = Non-authenticated data:
> Unacceptable
>         Questions: 1
>         Answer RRs: 0
>         Authority RRs: 0
>         Additional RRs: 0
>         Queries
>             do.havedane.net: type MX, class IN
>                 Name: do.havedane.net
>                 [Name Length: 15]
>                 [Label Count: 3]
>                 Type: MX (Mail eXchange) (15)
>                 Class: IN (0x0001)
>
> Is 127.0.0.11 inside the container == 192.168.4.254 outside?

Yes, sorry. Should have mentioned it. The 127.0.0.11 ist the docker way
of saying: This DNS is configured by docker. And indeed the docker
container IP of unbound is 192.168.4.254.

> What C-library and operating system is this?  Perhaps the C-library
> in
> Docker ignores RES_USE_EDNS0 and RES_USE_DNSSEC or more generally
> changes to _res.options?
>
>
> I think the tcpdumps are interesting, as they show that postfix is
> not
> requesting with the right flags (If I am not reading everything
> wrong).
>
> When Postfix is configured with "smtp_dns_support_level = dnssec",
> the
> RES_USE_DNSSEC and RES_USE_EDNS0 flags are set around calls to the
> resolver routines.  If your C-library (perhaps only inside docker)
> has
> an incopatible resolver API, then you'll need a more compatible
> resolver
> library and/or a different container technology.
>

The container is running on alpine, hence with muslc libc. After seeing
the tcpdump yesterday, I thought as well, if that could be an issue.

I am no programmer, however 2 things strike me:
Dig is able to construct a proper request and I thought it is using the
resolver routines for its tests?

resolv.h for musl-libc at lease mention RES_USE_DNSSEC and
RES_USE_EDNS0 (not sure if that means anything)
https://git.musl-libc.org/cgit/musl/tree/include/resolv.h#n102
https://git.musl-libc.org/cgit/musl/tree/include/resolv.h#n105

So I am not sure if this means anything for postfix, but how could we
find out that it is indeed an incompatibility? Is there a way to log
the construction of the request and hence the failure to properly send
it?

Will also contact musl-libc to see if they have ideas.



Reply | Threaded
Open this post in threaded view
|

Re: Outgoing DANE not working

Damian Lukowski
In reply to this post by Viktor Dukhovni
The validator [1] says TLSA is ok, so is this even be a DNS issue? If I
have to guess, Postfix encounters the following situation:

> When TLSA records are found, but are all unusable the effective security level is "encrypt"

The documentation does not state that self-signed certificates are
invalid with the "encrypt" security level, they are with "verify".

[1] https://dane.sys4.de/smtp/wrong.havedane.net
Reply | Threaded
Open this post in threaded view
|

Re: Outgoing DANE not working

Viktor Dukhovni
[ To the OP: feel free to ignore the below response, it is irrelevant. ]

> On Apr 13, 2020, at 5:22 AM, Damian <[hidden email]> wrote:
>
> The validator [1] says TLSA is ok, so is this even be a DNS issue? If I
> have to guess, Postfix encounters the following situation:
>
>> When TLSA records are found, but are all unusable the effective security level is "encrypt"
>
> The documentation does not state that self-signed certificates are
> invalid with the "encrypt" security level, they are with "verify".
>
> [1] https://dane.sys4.de/smtp/wrong.havedane.net

--
        Viktor.

Reply | Threaded
Open this post in threaded view
|

Re: Outgoing DANE not working

Christian
In reply to this post by Damian Lukowski
Hi Damian,

Am Montag, den 13.04.2020, 11:22 +0200 schrieb Damian:

> The validator [1] says TLSA is ok, so is this even be a DNS issue? If I
> have to guess, Postfix encounters the following situation:
>
>
> When TLSA records are found, but are all unusable the effective security level is "encrypt"
>
> The documentation does not state that self-signed certificates are
> invalid with the "encrypt" security level, they are with "verify".
>
> [1] https://dane.sys4.de/smtp/wrong.havedane.net
>

I am not sure what you are saying.

The havedane.net test consists of 3 different servers do., dont. and
wrong.havedane.net, all with self-sig certificates.
The difference is the TLSA records:

do. has a correct one
dont. is having none
wrong. is having a wrong one (your link shows that)

Hence the result of the connections should be:
do. = Verified (DANE did the verification)
dont. = Untrusted (Just regular TLS w/o DANE) with signed cert it would
be Trusted
wrong. = No delivery at all (DANE verification fails)

The "wrong." one is main security benefit of DANE, as it can spot
tampered certificates. The "do." is additional security/convenience, as
you can use self-sig certs and do not need to rely on CAs. "dont." of
course does not matter.

However the tcpdumps show, that my Postfix is not getting any TLSA
information via DNS, so in my server all three get delivered and the
connection is stated as untrusted, like there is no DANE involved and
it just behaves like a regular TLS setup.

Reply | Threaded
Open this post in threaded view
|

Re: Outgoing DANE not working

Viktor Dukhovni
In reply to this post by Christian
> On Apr 13, 2020, at 4:56 AM, Christian <[hidden email]> wrote:
>
> The container is running on alpine, hence with muslc libc. After seeing
> the tcpdump yesterday, I thought as well, if that could be an issue.
>
> I am no programmer, however 2 things strike me:
> Dig is able to construct a proper request and I thought it is using the
> resolver routines for its tests?

No, dig(1) does not use libresolv, it has its own, much more sophisticated
DNS packet encoding and decoding routines, which expose features not available
via the traditional libresolv.

> resolv.h for musl-libc at lease mention RES_USE_DNSSEC and
> RES_USE_EDNS0 (not sure if that means anything)
> https://git.musl-libc.org/cgit/musl/tree/include/resolv.h#n102
> https://git.musl-libc.org/cgit/musl/tree/include/resolv.h#n105

The comment on line 25:

  https://github.com/runtimejs/musl-libc/blob/master/include/resolv.h#L25

is not encouraging.  It suggests that _res is unused.  If so, Postfix DNS
does not work correctly with this C library.  And not just for DANE, since
Postfix is also unable to to control RES_DEFNAMES and RES_DNSRCH.

Indeed searching the github repo for RES_USE_DNSSEC and RES_USE_EDNS0 finds
hits only the header file, and similarly:

  https://raw.githubusercontent.com/runtimejs/musl-libc/master/src/network/res_state.c

pretty much rules out support for configurable lookup options.  Bottom line:

  https://dilbert.com/strip/1995-06-24

> So I am not sure if this means anything for postfix, but how could we
> find out that it is indeed an incompatibility? Is there a way to log
> the construction of the request and hence the failure to properly send
> it?

You are logging it, with tcpdump.  I am afraid that musl-libc is unsuitable
for use with Postfix.  You'll need a container with a less crippled C-library.

--
        Viktor.

Reply | Threaded
Open this post in threaded view
|

Re: Outgoing DANE not working

Damian Lukowski
In reply to this post by Christian
>> The validator [1] says TLSA is ok, so is this even be a DNS issue? If I
>> have to guess, Postfix encounters the following situation:
>>
>>
>> When TLSA records are found, but are all unusable the effective security level is "encrypt"
>>
>> The documentation does not state that self-signed certificates are
>> invalid with the "encrypt" security level, they are with "verify".
>>
>> [1] https://dane.sys4.de/smtp/wrong.havedane.net
>>
> I am not sure what you are saying.

As Viktor pointed out, it does not matter what I'm saying. I seem to
have misinterpreted the Postfix documentation.
Reply | Threaded
Open this post in threaded view
|

Re: Outgoing DANE not working

Viktor Dukhovni
In reply to this post by Viktor Dukhovni
> On Apr 13, 2020, at 5:52 AM, Viktor Dukhovni <[hidden email]> wrote:
>
> Indeed searching the github repo for RES_USE_DNSSEC and RES_USE_EDNS0 finds
> hits only the header file, and similarly:
>
>  https://raw.githubusercontent.com/runtimejs/musl-libc/master/src/network/res_state.c
>
> pretty much rules out support for configurable lookup options.  Bottom line:
>
>  https://dilbert.com/strip/1995-06-24

The musl-libc resolver code also includes gems like:

  https://github.com/runtimejs/musl-libc/blob/master/src/network/__dns.c#L67-L69

So not terribly safe if using a remote resolver.  Definitely no support for EDNS(0)
or sending the "DO" or "AD" bits in the request.

Always queries all resolvers in parallel without waiting for a short timeout from
the first one (or use connect(2) for prompt notification of host/port unreachable).

There is no support for truncated responses or TCP failover, so if a host has enough
IP addresses, some may be dropped, and FcRDNS checks may fail spuriously.

This library cuts too many corners, it is not supported.

--
        Viktor.

Reply | Threaded
Open this post in threaded view
|

Re: Outgoing DANE not working

Christian
In reply to this post by Viktor Dukhovni
Am Montag, den 13.04.2020, 05:52 -0400 schrieb Viktor Dukhovni:

>
> On Apr 13, 2020, at 4:56 AM, Christian <[hidden email]> wrote:
>
> The container is running on alpine, hence with muslc libc. After
> seeing
> the tcpdump yesterday, I thought as well, if that could be an issue.
>
> I am no programmer, however 2 things strike me:
> Dig is able to construct a proper request and I thought it is using
> the
> resolver routines for its tests?
>
> No, dig(1) does not use libresolv, it has its own, much more
> sophisticated
> DNS packet encoding and decoding routines, which expose features not
> available
> via the traditional libresolv.
>
>
> resolv.h for musl-libc at lease mention RES_USE_DNSSEC and
> RES_USE_EDNS0 (not sure if that means anything)
> https://git.musl-libc.org/cgit/musl/tree/include/resolv.h#n102
> https://git.musl-libc.org/cgit/musl/tree/include/resolv.h#n105
>
> The comment on line 25:
>
>
> https://github.com/runtimejs/musl-libc/blob/master/include/resolv.h#L25
>
> is not encouraging.  It suggests that _res is unused.  If so, Postfix
> DNS
> does not work correctly with this C library.  And not just for DANE,
> since
> Postfix is also unable to to control RES_DEFNAMES and RES_DNSRCH.
>
> Indeed searching the github repo for RES_USE_DNSSEC and RES_USE_EDNS0
> finds
> hits only the header file, and similarly:
>
>
> https://raw.githubusercontent.com/runtimejs/musl-libc/master/src/network/res_state.c
>
> pretty much rules out support for configurable lookup
> options.  Bottom line:
>
>   https://dilbert.com/strip/1995-06-24
>

Ok, I understand that, will check with the musl-libc guys on why this
is the case. They write "broken apps", so in their view there is
another way to do this which is "not broken"?!

Could it be that I step into a discussion on how this is done "right"?
;-)

Nevertheless, it should probably be included in the Postfix DANE
documentation to avoid muslc setups with postfix for now.
As  most people (as I did before) do not test outgoing DANE, this goes
unnoticed.

Thanks a lot for your help! Will check back if I have news.

> So I am not sure if this means anything for postfix, but how could we
> find out that it is indeed an incompatibility? Is there a way to log
> the construction of the request and hence the failure to properly
> send
> it?
>
> You are logging it, with tcpdump.  I am afraid that musl-libc is
> unsuitable
> for use with Postfix.  You'll need a container with a less crippled
> C-library.
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Outgoing DANE not working

Viktor Dukhovni
> On Apr 13, 2020, at 6:38 AM, Christian <[hidden email]> wrote:
>
> Nevertheless, it should probably be included in the Postfix DANE
> documentation to avoid muslc setups with postfix for now.

Postfix expects a C-library implementation of the DNS stub resolver
routines that is compatible with the original BSD design.

It don't think it is reasonable to curate a list of defective
C-library resolver implementations.

Unless Wietse some day throws in the towel and makes libunbound
or libldns a required dependency for Postfix, we're stuck with
the traditional libresolv interface, and it needs to have a
reasonably complete implementation on your system.

As I already mentioned, even without DANE, musl-libc already
fails to provide adequate controls to disable the search
list (RES_DEFNAMES and RES_DNSRCH).

DO NOT run Postfix over musl-libc.

--
        Viktor.

Reply | Threaded
Open this post in threaded view
|

Re: Outgoing DANE not working

Christian
Am Montag, den 13.04.2020, 06:57 -0400 schrieb Viktor Dukhovni:

>
> On Apr 13, 2020, at 6:38 AM, Christian <[hidden email]> wrote:
>
> Nevertheless, it should probably be included in the Postfix DANE
> documentation to avoid muslc setups with postfix for now.
>
> Postfix expects a C-library implementation of the DNS stub resolver
> routines that is compatible with the original BSD design.
>
> It don't think it is reasonable to curate a list of defective
> C-library resolver implementations.
>
> Unless Wietse some day throws in the towel and makes libunbound
> or libldns a required dependency for Postfix, we're stuck with
> the traditional libresolv interface, and it needs to have a
> reasonably complete implementation on your system.
>
> As I already mentioned, even without DANE, musl-libc already
> fails to provide adequate controls to disable the search
> list (RES_DEFNAMES and RES_DNSRCH).
>
> DO NOT run Postfix over musl-libc.
>

I agree and will not do so anymore.
Just thinking about the popularity of container setups with Alpine and
the fact, that this setup runs quite well if you are not digging into
it like we did in the last days.
And as said, most E-Mail server test websites only check incoming DANE.
Hence trying to prevent a spread of this.

FYI: I put your findings forward to the musl-libc mailing list and
asked what they now think what should be done.

Reply | Threaded
Open this post in threaded view
|

Re: Outgoing DANE not working

Christian
To finalise this as solved

Just moved Postfix to a Debian based container and now DANE is working as expected.

Hence if anyone comes by this thread, follow Viktors advice:
> DO NOT run Postfix over musl-libc.

Hence not on regular Alpine.

Reply | Threaded
Open this post in threaded view
|

Re: Outgoing DANE not working

Viktor Dukhovni
In reply to this post by Christian
> On Apr 13, 2020, at 7:18 AM, Christian <[hidden email]> wrote:
>
> FYI: I put your findings forward to the musl-libc mailing list and
> asked what they now think what should be done.

The problem can be partly resolved by setting the "AD" bit in the
outgoing DNS query header sent by the musl-libc stub resolver.  Then
the local iterative resolver will return the AD bit in its response.

However, lack of support for retrying truncated responses over TCP
or support for disabling RES_DEFNAMES and RES_DNSRCH remain as issues.

--
        Viktor.

Reply | Threaded
Open this post in threaded view
|

Re: Outgoing DANE not working

Rich Felker
On Mon, Apr 13, 2020 at 02:15:14PM -0400, Viktor Dukhovni wrote:

> > On Apr 13, 2020, at 7:18 AM, Christian <[hidden email]> wrote:
> >
> > FYI: I put your findings forward to the musl-libc mailing list and
> > asked what they now think what should be done.
>
> The problem can be partly resolved by setting the "AD" bit in the
> outgoing DNS query header sent by the musl-libc stub resolver.  Then
> the local iterative resolver will return the AD bit in its response.
>
> However, lack of support for retrying truncated responses over TCP
> or support for disabling RES_DEFNAMES and RES_DNSRCH remain as issues.

This has also been discussed some on the musl list already
(https://www.openwall.com/lists/musl/2020/04/13/1) but I'm replying
into this thread as well because I'd like to come to some mutually
acceptable solution.

musl's stub resolver intentionally speaks only rfc1035 udp, and the
intent has always been that DNSSEC validation and policy be the
responsibility of the nameserver running on localhost, not the stub
resolver or the calling application. The resolver is intentionally
stateless. It was probably a mistake to provide the fake _res
definition, and I'm interested in resolving that mistake either by
removing it or adding res_n* API that honor (parts of) it at some
point, but determining the right action here and coordinating with
distros to ensure they have fixes in place for anything that breaks
will take a while.

RES_DEFNAMES and RES_DNSRCH are irrelevant as search is never
performed by the res_* interfaces, and domain/search keywords are used
only by the high-level ones (getaddrinfo/getnameinfo and the old
legacy gethostby*).

What is relevant, as far as I can tell, is that Postfix wants a way to
perform an EDNS0 query that lets it distinguish between a valid signed
result and a valid unsigned result. This is currently not possible,
but would be practical to add based on "options edns0" in resolv.conf.
I'm not sure if or how soon that will happen, but determining that is
something I'd like to have come out of this discussion.

From my perspective, what would work best with what's always been the
intended DNSSEC usage model of musl would be if Postfix supported use
of DANE with smtp_dns_support_level=enabled, i.e. outsourcing all
DNSSEC functionality to the nameserver.

Rich
Reply | Threaded
Open this post in threaded view
|

Re: Outgoing DANE not working

Viktor Dukhovni
On Mon, Apr 13, 2020 at 02:35:22PM -0400, Rich Felker wrote:

> > The problem can be partly resolved by setting the "AD" bit in the
> > outgoing DNS query header sent by the musl-libc stub resolver.  Then
> > the local iterative resolver will return the AD bit in its response.
> >
> > However, lack of support for retrying truncated responses over TCP
> > or support for disabling RES_DEFNAMES and RES_DNSRCH remain as issues.
>
> musl's stub resolver intentionally speaks only rfc1035 udp,

Lack of TCP support and ignoring the TC bit means that large responses
get truncated, possibly breaking FCrDNS and triggering false positivies
via reject_unknown_client_hostname.

It is also not uncommon for applications that use SRV records to
encounter large RRsets (e.g. Windows Domain controller lists for
large Active-Directory domains in MIT Kerberos or Heimdal).

> and the intent has always been that DNSSEC validation and policy be
> the responsibility of the nameserver running on localhost, not the
> stub resolver or the calling application.

But some applications need to see the AD bit returned by the local
resolver in order to distiguish between validated and non-validated
results.  Recursive Nameservers (BIND, Unbound, ...) will only set
(when appropriate) the AD bit in replies if it is set in the incoming
query.  The AD bit is part of the standard DNS header:

    The basic DNS header flags word is a mixture of flag bits and numbers,
    <https://tools.ietf.org/html/rfc2535#section-6.1>:
   
     +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
     |QR|   Opcode  |AA|TC|RD|RA| Z|AD|CD|   RCODE   |
     +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

and a one-line change in the musl-libc stub resolver can set the AD bit
when the target resolver is local (127.0.0.0/8 or ::1/128).

> The resolver is intentionally stateless.

By stateless you mean no shared global environment, which means no
_res.options.  That does not preclude TCP support, EDNS(0) support,
etc., though historically, efficient support of EDNS(0) meant
remembering which upstream nameservers don't support EDNS(0).

Now that pretty all the iterative resolvers you're likely to
encounter as upstream forwarders do support EDNS, you could
just require such a resolver and use EDNS(0) unconditionally,
and use a buffer size larger than 512 for UDP, and support
TCP retries for truncated responses.

> RES_DEFNAMES and RES_DNSRCH are irrelevant as search is never
> performed by the res_* interfaces, and domain/search keywords are used
> only by the high-level ones (getaddrinfo/getnameinfo and the old
> legacy gethostby*).

That's fine then.

> What is relevant, as far as I can tell, is that Postfix wants a way to
> perform an EDNS0 query that lets it distinguish between a valid signed
> result and a valid unsigned result.

No, Postfix just wants the AD bit, but sadly the traditional resolver
API does not have RES_USE_ADBIT, it only has RES_USE_DNSSEC which sets
the DO bit in the EDNS(0) extended header.  If (see above) you just
set the AD bit for all requests to local resolvers, Postfix will get
all the DNSSEC support it needs.

> This is currently not possible, but would be practical to add based on
> "options edns0" in resolv.conf.

EDNS(0) is not needed, except to avoid unnecessary TCP failover for
responses between 512 and ~1400 bytes (precise recommended EDNS(0) UDP
buffer size still under discussion in the DNS community, the recent
recommendation of 1232 bytes may be too conservative).

> I'm not sure if or how soon that will happen, but determining that is
> something I'd like to have come out of this discussion.

A one line change can set the AD bit, ideally only when the target
resolver is local (i.e. loopback, which would be an improvement for
Postfix, we normally can't tell whether the AD bit was returned by a
trusted local or untrusted remote resolver).

> From my perspective, what would work best with what's always been the
> intended DNSSEC usage model of musl would be if Postfix supported use
> of DANE with smtp_dns_support_level=enabled, i.e. outsourcing all
> DNSSEC functionality to the nameserver.

Sorry, we actually need to know which records were validated in
signed domains, and which are "insecure" responses from unsigned
domains.  That's what the AD bit is for, and you're not setting
it in requests, and so it does not come back in the response.

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: Outgoing DANE not working

Rich Felker
On Mon, Apr 13, 2020 at 03:04:12PM -0400, Viktor Dukhovni wrote:

> On Mon, Apr 13, 2020 at 02:35:22PM -0400, Rich Felker wrote:
>
> > > The problem can be partly resolved by setting the "AD" bit in the
> > > outgoing DNS query header sent by the musl-libc stub resolver.  Then
> > > the local iterative resolver will return the AD bit in its response.
> > >
> > > However, lack of support for retrying truncated responses over TCP
> > > or support for disabling RES_DEFNAMES and RES_DNSRCH remain as issues.
> >
> > musl's stub resolver intentionally speaks only rfc1035 udp,
>
> Lack of TCP support and ignoring the TC bit means that large responses
> get truncated, possibly breaking FCrDNS and triggering false positivies
> via reject_unknown_client_hostname.
>
> It is also not uncommon for applications that use SRV records to
> encounter large RRsets (e.g. Windows Domain controller lists for
> large Active-Directory domains in MIT Kerberos or Heimdal).

The justification here has always been that a number of clients are in
positions where they can't perform tcp queries, e.g. their nameservers
only support udp and possibly only support rfc1035. Of course such an
environment is incompatible with validating dnssec, but from the
perspective of the domain defining the records, having so many/such
long records (not counting signatures) that they can't be delivered to
such clients without truncation means the domain has accessibility
problems.

Fallback to tcp on TC would also yield very bad performance for users
who are not running a local nameserver whenever looking up names with
ridiculous numbers of A/AAAA records, where the truncated response
certainly suffices (except in your example of FCrDNS).

It's possible that some of these choices can be revisited over time,
but they were made for good reasons, not at random.

> > and the intent has always been that DNSSEC validation and policy be
> > the responsibility of the nameserver running on localhost, not the
> > stub resolver or the calling application.
>
> But some applications need to see the AD bit returned by the local
> resolver in order to distiguish between validated and non-validated
> results.  Recursive Nameservers (BIND, Unbound, ...) will only set
> (when appropriate) the AD bit in replies if it is set in the incoming
> query.  The AD bit is part of the standard DNS header:

Is the AD bit valid as part of a query? I couldn't find where this is
documented, and it's almost certainly not supported (possibly
rejected/dropped) by servers that aren't aware of it.

>     The basic DNS header flags word is a mixture of flag bits and numbers,
>     <https://tools.ietf.org/html/rfc2535#section-6.1>:
>    
>      +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
>      |QR|   Opcode  |AA|TC|RD|RA| Z|AD|CD|   RCODE   |
>      +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
>
> and a one-line change in the musl-libc stub resolver can set the AD bit
> when the target resolver is local (127.0.0.0/8 or ::1/128).

I'm confused whether you're saying it should be set in the outgoing
query or forged in the response. The latter sounds like a really bad
idea. If the former, I don't see why it would be done conditional on
being a local resolver (and also local need not be 127.0.0.1 or ::1;
it can be public address of localhost or a lot of other things, e.g. a
tunnel out of a container to the actual host, depending on network
setup). Is the idea just that you assume as local one would support it
whereas for a remote one it might be unknown? I don't think this kind
of policy decision belongs in the stub resolver; for instance it would
break in the other direction if you implemented nameserver on
127.0.0.1 (e.g. just to avoid needing a resolv.conf file) by an
iptables rule to redirect to the real nameserver.

> > What is relevant, as far as I can tell, is that Postfix wants a way to
> > perform an EDNS0 query that lets it distinguish between a valid signed
> > result and a valid unsigned result.
>
> No, Postfix just wants the AD bit, but sadly the traditional resolver
> API does not have RES_USE_ADBIT, it only has RES_USE_DNSSEC which sets
> the DO bit in the EDNS(0) extended header.  If (see above) you just
> set the AD bit for all requests to local resolvers, Postfix will get
> all the DNSSEC support it needs.

I think just adding a resolv.conf option for using the AD bit might be
appropriate. One issue that makes this more complicated though is how
the API is factored. res_mkquery in theory doesn't/shouldn't depend on
the particular nameservers, but should just serialize a query that can
be used with any server (e.g. my implementation of host(1) does this
to send to the server you give it on the command line). But the choice
of configuration is specific to the configured nameservers.

> > From my perspective, what would work best with what's always been the
> > intended DNSSEC usage model of musl would be if Postfix supported use
> > of DANE with smtp_dns_support_level=enabled, i.e. outsourcing all
> > DNSSEC functionality to the nameserver.
>
> Sorry, we actually need to know which records were validated in
> signed domains, and which are "insecure" responses from unsigned
> domains.  That's what the AD bit is for, and you're not setting
> it in requests, and so it does not come back in the response.

Can you describe why? Is it only for the sake of not using TLSA
records in unsigned domains? That kind of policy can be implemented at
the resolver level and my intent was always that, if desired, it would
be. But I can understand that you may not want that, or that there may
be other reasons it doesn't suffice. I still think it would be useful
to allow the user to configure such a setting; it's certainly better
than DANE not working.

Rich
Reply | Threaded
Open this post in threaded view
|

Re: Outgoing DANE not working

Viktor Dukhovni
On Mon, Apr 13, 2020 at 03:35:05PM -0400, Rich Felker wrote:

> > It is also not uncommon for applications that use SRV records to
> > encounter large RRsets (e.g. Windows Domain controller lists for
> > large Active-Directory domains in MIT Kerberos or Heimdal).
>
> The justification here has always been that a number of clients are in
> positions where they can't perform tcp queries, e.g. their nameservers
> only support udp and possibly only support rfc1035.

The TC bit and TCP support are in RFC1025.  TCP is a required DNS
feature, it is NOT optional.  If nameservers fail to support TCP they're
broken.  The stub resolvers in BSD libraries and glibc do TCP, and don't
seem to have any real difficulties.  I'm inclined to say that the above
design decision is not evidence-based.

> Of course such an environment is incompatible with validating dnssec,
> but from the perspective of the domain defining the records, having so
> many/such long records (not counting signatures) that they can't be
> delivered to such clients without truncation means the domain has
> accessibility problems.

DNS supports large RRsets, and has had TC=1 for those for ~4 decades.
The issue is not specifically a DNSSEC issue.

> Fallback to tcp on TC would also yield very bad performance for users
> who are not running a local nameserver whenever looking up names with
> ridiculous numbers of A/AAAA records, where the truncated response
> certainly suffices (except in your example of FCrDNS).

Your local nameserver has already done the TCP failover and paid the
cost of obtaining the full RRset, your stub resolver is just failing to
give it the opportunity to return the full data to you.  The performance
cost is low, and such records are a minority.  Correctness trumps
performance where I come from.  Cutting corners for performance and
violating requirements is not acceptable.

> It's possible that some of these choices can be revisited over time,
> but they were made for good reasons, not at random.

They may be deliberate, but I rather disagree about the quality of the
reasons.

> > But some applications need to see the AD bit returned by the local
> > resolver in order to distiguish between validated and non-validated
> > results.  Recursive Nameservers (BIND, Unbound, ...) will only set
> > (when appropriate) the AD bit in replies if it is set in the incoming
> > query.  The AD bit is part of the standard DNS header:
>
> Is the AD bit valid as part of a query?

Absolutely, and indeed it is required in order to solicit the AD bit
in return.  And e.g. dig(1) sets the AD bit in requests by default,
and you need to use "dig +noad" to turn it off!

> I couldn't find where this is documented, and it's almost certainly
> not supported (possibly rejected/dropped) by servers that aren't aware
> of it.

That is not the case.  In order for DNS to be extensible, servers are
required to ignore previously reserved flag bits, so that they can
later be assigned.

> >     The basic DNS header flags word is a mixture of flag bits and numbers,
> >     <https://tools.ietf.org/html/rfc2535#section-6.1>:
> >    
> >      +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
> >      |QR|   Opcode  |AA|TC|RD|RA| Z|AD|CD|   RCODE   |
> >      +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
> >
> > and a one-line change in the musl-libc stub resolver can set the AD bit
> > when the target resolver is local (127.0.0.0/8 or ::1/128).
>
> I'm confused whether you're saying it should be set in the outgoing
> query or forged in the response.

Set in the outgoing query, which solicits the actual value in the
reply.  Here's a normal query with "AD=1" in the outgoing request:

    $ dig +noedns +noall +comment +ans -t soa ietf.org
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29386
    ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

    ;; ANSWER SECTION:
    ietf.org.               1699    IN      SOA     ns0.amsl.com. glen.amsl.com. 1200000458 1800 1800 604800 1800

The AD bit is set in the reply, since ietf.org is signed.  Below is
the same query with "AD=0" in the outgoing request:

    $ dig +noad +noedns +noall +comment +ans -t soa ietf.org
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 65503
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

    ;; ANSWER SECTION:
    ietf.org.               1693    IN      SOA     ns0.amsl.com. glen.amsl.com. 1200000458 1800 1800 604800 1800

there is also no AD bit in the reply.  Implementors of stub resolvers
need to read many RFCs or consult experts who have:

    https://tools.ietf.org/html/rfc6840#section-5.7

> If the former, I don't see why it would be done conditional on
> being a local resolver (and also local need not be 127.0.0.1 or ::1;
> it can be public address of localhost or a lot of other things, e.g. a
> tunnel out of a container to the actual host, depending on network
> setup).

Because the AD bit from a non-local resolver is not trustworthy.  One
might imagine resolver configurations in which one can indicate that the
network path to a range of non-local IP addresses (perhaps IPSEC or
other secure link) is tamper-resistant, but as a default it may make
sense to ignore the AD bit from remote IPs.

Not ignoring is not worse than the situation that Postfix is in today,
where we don't know whether the AD bit returned by libresolv is
trustworthy or not, and just document the requirement for a local
resolver, and hope that users who want DANE security pay attention to
the docs.

However, I am suggesting that ignoring non-local AD bits would in fact
resolve that issue.  A more complete implementation would have a
configurable whitelist of "trusted" resolvers.

> Is the idea just that you assume as local one would support it
> whereas for a remote one it might be unknown?

No, in order to get the AD in a reply, you need to set it in
the request.  Modern resolvers do not return the AD bit otherwise.

> I don't think this kind of policy decision belongs in the stub
> resolver; for instance it would break in the other direction if you
> implemented nameserver on 127.0.0.1 (e.g. just to avoid needing a
> resolv.conf file) by an iptables rule to redirect to the real
> nameserver.

That's one way of signalling that you trust the path to the resolver.

> I think just adding a resolv.conf option for using the AD bit might be
> appropriate. One issue that makes this more complicated though is how
> the API is factored.

You can safely set it unconditionally, or just to the loopback ones (to
help remove an AD-bit MiTM footgun).  No known resolvers will object to
the AD in queries.

> res_mkquery in theory doesn't/shouldn't depend on
> the particular nameservers, but should just serialize a query that can
> be used with any server (e.g. my implementation of host(1) does this
> to send to the server you give it on the command line). But the choice
> of configuration is specific to the configured nameservers.

You can inject the AD bit just before sending the packet to a particular
server.

> > Sorry, we actually need to know which records were validated in
> > signed domains, and which are "insecure" responses from unsigned
> > domains.  That's what the AD bit is for, and you're not setting
> > it in requests, and so it does not come back in the response.
>
> Can you describe why?

I can, but you can just read RFC 7672 if you like, I've already
explained it there.  Bottom line, it is needed.

> Is it only for the sake of not using TLSA
> records in unsigned domains? That kind of policy can be implemented at
> the resolver level

It cannot and should not be implemented at the resolver level.

--
    Viktor.
123