dnsblog and "Name service error"

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

dnsblog and "Name service error"

Alex Regan
Hi, I've been experiencing these weird "Name service error" problems
for a month or so and can't figure out what's causing them.

Sep 13 10:04:59 mail03 postfix/dnsblog[30902]: warning: dnsblog_query:
lookup error for DNS query 25.188.223.216.b.barracudacentral.org: Host
or domain name not found. Name service error for
name=25.188.223.216.b.barracudacentral.org type=A: Host not found, try
again
Sep 13 10:04:59 mail03 postfix/dnsblog[30920]: warning: dnsblog_query:
lookup error for DNS query 25.188.223.216.score.senderscore.com: Host
or domain name not found. Name service error for
name=25.188.223.216.score.senderscore.com type=A: Host not found, try
again
Sep 13 10:04:59 bwimail03 postfix/dnsblog[22943]: warning:
dnsblog_query: lookup error for DNS query
25.188.223.216.bl.mailspike.net: Host or domain name not found. Name
service error for name=25.188.223.216.bl.mailspike.net type=A: Host
not found, try again

Is there a way to enable some form of enhanced debugging just for dnsblog?

I've traced the packets using wireshark and they all appear to occur
on 127.0.0.1 (as that's how /etc/resolv.conf is configured), and the
packet trace is uninteresting. There's no real information as to why
the packet is listed as "server failure".

This is a pretty active mail server, with as many as 800 queries/sec
during peak, I believe. This is on a fedora28 system with
postfix-3.3.1 and bind-9.11.4 configured as a basic caching name
server with recursion enabled. This is on a 20mbit business-class
cable modem.

I've also included one of those packet traces. I realize this may not
be a postfix problem, but I'm hoping it can help to elucidate more of
what's happening.

No.     Time           Source                Destination
Protocol Length Info
   9083 11.730327      127.0.0.1             127.0.0.1             DNS
     104    Standard query response 0xded6 Server failure A
25.188.223.216.wl.mailspike.net OPT

Frame 9083: 104 bytes on wire (832 bits), 104 bytes captured (832 bits)
    Encapsulation type: Linux cooked-mode capture (25)
    Arrival Time: Sep 13, 2018 15:46:36.633305000 EDT
    [Time shift for this packet: 0.000000000 seconds]
    Epoch Time: 1536867996.633305000 seconds
    [Time delta from previous captured frame: 0.000969000 seconds]
    [Time delta from previous displayed frame: 0.006367000 seconds]
    [Time since reference or first frame: 11.730327000 seconds]
    Frame Number: 9083
    Frame Length: 104 bytes (832 bits)
    Capture Length: 104 bytes (832 bits)
    [Frame is marked: False]
    [Frame is ignored: False]
    [Protocols in frame: sll:ethertype:ip:udp:dns]
    [Coloring Rule Name: UDP]
    [Coloring Rule String: udp]
Linux cooked capture
    Packet type: Unicast to us (0)
    Link-layer address type: 772
    Link-layer address length: 6
    Source: 00:00:00_00:00:00 (00:00:00:00:00:00)
    Unused: 6fc0
    Protocol: IPv4 (0x0800)
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
        0000 00.. = Differentiated Services Codepoint: Default (0)
        .... ..00 = Explicit Congestion Notification: Not ECN-Capable
Transport (0)
    Total Length: 88
    Identification: 0x2dff (11775)
    Flags: 0x0000
        0... .... .... .... = Reserved bit: Not set
        .0.. .... .... .... = Don't fragment: Not set
        ..0. .... .... .... = More fragments: Not set
        ...0 0000 0000 0000 = Fragment offset: 0
    Time to live: 64
    Protocol: UDP (17)
    Header checksum: 0x4e94 [validation disabled]
    [Header checksum status: Unverified]
    Source: 127.0.0.1
    Destination: 127.0.0.1
User Datagram Protocol, Src Port: 53, Dst Port: 12304
    Source Port: 53
    Destination Port: 12304
    Length: 68
    Checksum: 0xfe57 [unverified]
    [Checksum Status: Unverified]
    [Stream index: 320]
Domain Name System (response)
    Transaction ID: 0xded6
    Flags: 0x8182 Standard query response, Server failure
        1... .... .... .... = Response: Message is a response
        .000 0... .... .... = Opcode: Standard query (0)
        .... .0.. .... .... = Authoritative: Server is not an
authority for domain
        .... ..0. .... .... = Truncated: Message is not truncated
        .... ...1 .... .... = Recursion desired: Do query recursively
        .... .... 1... .... = Recursion available: Server can do
recursive queries
        .... .... .0.. .... = Z: reserved (0)
        .... .... ..0. .... = Answer authenticated: Answer/authority
portion was not authenticated by the server
        .... .... ...0 .... = Non-authenticated data: Unacceptable
        .... .... .... 0010 = Reply code: Server failure (2)
    Questions: 1
    Answer RRs: 0
    Authority RRs: 0
    Additional RRs: 1
    Queries
        25.188.223.216.wl.mailspike.net: type A, class IN
            Name: 25.188.223.216.wl.mailspike.net
            [Name Length: 31]
            [Label Count: 7]
            Type: A (Host Address) (1)
            Class: IN (0x0001)
    Additional records
        <Root>: type OPT
            Name: <Root>
            Type: OPT (41)
            UDP payload size: 4096
            Higher bits in extended RCODE: 0x00
            EDNS0 version: 0
            Z: 0x0000
                0... .... .... .... = DO bit: Cannot handle DNSSEC security RRs
                .000 0000 0000 0000 = Reserved: 0x0000
            Data length: 0
    [Unsolicited: True]
Reply | Threaded
Open this post in threaded view
|

Re: dnsblog and "Name service error"

Viktor Dukhovni


> On Sep 13, 2018, at 6:11 PM, Alex <[hidden email]> wrote:
>
> Hi, I've been experiencing these weird "Name service error" problems
> for a month or so and can't figure out what's causing them.
>
> Sep 13 10:04:59 mail03 postfix/dnsblog[30902]: warning: dnsblog_query:
> lookup error for DNS query 25.188.223.216.b.barracudacentral.org: Host
> or domain name not found. Name service error for
> name=25.188.223.216.b.barracudacentral.org type=A: Host not found, try
> again

Your resolver cannot resolve the address and returns SERVFAIL.  Perhaps
the RBLs you're using are no longer willing to process your queries.

The traffic of interest is not between Postfix and your resolver, but
between your resolver and various nameservers on the public Internet.

--
        Viktor.

Reply | Threaded
Open this post in threaded view
|

Re: dnsblog and "Name service error"

Wietse Venema
In reply to this post by Alex Regan
Alex:

> Domain Name System (response)
>     Transaction ID: 0xded6
>     Flags: 0x8182 Standard query response, Server failure
>         1... .... .... .... = Response: Message is a response
>         .000 0... .... .... = Opcode: Standard query (0)
>         .... .0.. .... .... = Authoritative: Server is not an
> authority for domain
>         .... ..0. .... .... = Truncated: Message is not truncated
>         .... ...1 .... .... = Recursion desired: Do query recursively
>         .... .... 1... .... = Recursion available: Server can do
> recursive queries
>         .... .... .0.. .... = Z: reserved (0)
>         .... .... ..0. .... = Answer authenticated: Answer/authority
> portion was not authenticated by the server
>         .... .... ...0 .... = Non-authenticated data: Unacceptable
>         .... .... .... 0010 = Reply code: Server failure (2)

The REMOTE DNS server returns an error. Ask THEM why they do that.
Maybe you're exceeding the request quota; if so, pay them more.

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: dnsblog and "Name service error"

Alex Regan
Hi,

On Thu, Sep 13, 2018 at 7:56 PM Wietse Venema <[hidden email]> wrote:

>
> Alex:
> > Domain Name System (response)
> >     Transaction ID: 0xded6
> >     Flags: 0x8182 Standard query response, Server failure
> >         1... .... .... .... = Response: Message is a response
> >         .000 0... .... .... = Opcode: Standard query (0)
> >         .... .0.. .... .... = Authoritative: Server is not an
> > authority for domain
> >         .... ..0. .... .... = Truncated: Message is not truncated
> >         .... ...1 .... .... = Recursion desired: Do query recursively
> >         .... .... 1... .... = Recursion available: Server can do
> > recursive queries
> >         .... .... .0.. .... = Z: reserved (0)
> >         .... .... ..0. .... = Answer authenticated: Answer/authority
> > portion was not authenticated by the server
> >         .... .... ...0 .... = Non-authenticated data: Unacceptable
> >         .... .... .... 0010 = Reply code: Server failure (2)
>
> The REMOTE DNS server returns an error. Ask THEM why they do that.
> Maybe you're exceeding the request quota; if so, pay them more.

No, I should have mentioned that it doesn't always fail, and mailspike
(in this case) doesn't have a quota. It also fails for my registered
barracuda and other RBLs we pay for.

It failed about a thousand times over the course of the day, while at
least tens of thousands of queries with the same config have
succeeded.

Other ideas greatly appreciated.


>
>         Wietse
Reply | Threaded
Open this post in threaded view
|

Re: dnsblog and "Name service error"

Alex Regan
In reply to this post by Viktor Dukhovni
On Thu, Sep 13, 2018 at 6:16 PM Viktor Dukhovni
<[hidden email]> wrote:

>
>
>
> > On Sep 13, 2018, at 6:11 PM, Alex <[hidden email]> wrote:
> >
> > Hi, I've been experiencing these weird "Name service error" problems
> > for a month or so and can't figure out what's causing them.
> >
> > Sep 13 10:04:59 mail03 postfix/dnsblog[30902]: warning: dnsblog_query:
> > lookup error for DNS query 25.188.223.216.b.barracudacentral.org: Host
> > or domain name not found. Name service error for
> > name=25.188.223.216.b.barracudacentral.org type=A: Host not found, try
> > again
>
> Your resolver cannot resolve the address and returns SERVFAIL.  Perhaps
> the RBLs you're using are no longer willing to process your queries.
>
> The traffic of interest is not between Postfix and your resolver, but
> between your resolver and various nameservers on the public Internet.

Is it common practice for these RBLs to just block instead of
responding with an actual error? I believe they always result in some
type of timeout associated with the query. Notice this one's reached
the 30s max. Here's an entry from my bind logs:

13-Sep-2018 21:12:37.892 query-errors: info: client @0x7f84dc36a110
127.0.0.1#63311 (10.215.119.216.bb.barracudacentral.org): query failed
(SERVFAIL) for 10.215.119.216.bb.barracudacentral.org/IN/A at
../../../bin/named/query.c:8580

13-Sep-2018 21:12:37.892 query-errors: debug 2: fetch completed at
../../../lib/dns/resolver.c:3927 for
10.215.119.216.bb.barracudacentral.org/A in 30.000149: timed
out/success [domain:bb.barracudacentral.org,referral:0,restart:6,qrysent:15,timeout:8,lame:0,quota:0,neterr:2,badresp:0,adberr:0,findfail:0,valfail:0]

There's also a 'connection refused' entry associated with this one:
13-Sep-2018 21:12:11.487 lame-servers: info: connection refused
resolving '10.215.119.216.bb.barracudacentral.org/A/IN':
64.235.145.15#53

There are also thousands of successful entries using the same
bb.barracudacentral.org RBL.

Also, if I try and query it now, it succeeds (or at least it doesn't
timeout and returns NXDOMAIN).
# dig +trace +nodnssec 10.215.119.216.bb.barracudacentral.org

; <<>> DiG 9.11.4-P1-RedHat-9.11.4-5.P1.fc28 <<>> +trace +nodnssec
10.215.119.216.bb.barracudacentral.org
;; global options: +cmd
.                       1066    IN      NS      f.root-servers.net.
.                       1066    IN      NS      j.root-servers.net.
.                       1066    IN      NS      h.root-servers.net.
.                       1066    IN      NS      b.root-servers.net.
.                       1066    IN      NS      e.root-servers.net.
.                       1066    IN      NS      c.root-servers.net.
.                       1066    IN      NS      i.root-servers.net.
.                       1066    IN      NS      l.root-servers.net.
.                       1066    IN      NS      d.root-servers.net.
.                       1066    IN      NS      m.root-servers.net.
.                       1066    IN      NS      a.root-servers.net.
.                       1066    IN      NS      k.root-servers.net.
.                       1066    IN      NS      g.root-servers.net.
;; Received 839 bytes from 127.0.0.1#53(127.0.0.1) in 0 ms

org.                    172800  IN      NS      a0.org.afilias-nst.info.
org.                    172800  IN      NS      a2.org.afilias-nst.info.
org.                    172800  IN      NS      b0.org.afilias-nst.org.
org.                    172800  IN      NS      b2.org.afilias-nst.org.
org.                    172800  IN      NS      c0.org.afilias-nst.info.
org.                    172800  IN      NS      d0.org.afilias-nst.org.
;; Received 469 bytes from 192.203.230.10#53(e.root-servers.net) in 117 ms

barracudacentral.org.   86400   IN      NS      a2.verisigndns.com.
barracudacentral.org.   86400   IN      NS      a3.verisigndns.com.
barracudacentral.org.   86400   IN      NS      a1.verisigndns.com.
;; Received 133 bytes from 199.19.54.1#53(b0.org.afilias-nst.org) in 89 ms

bb.barracudacentral.org. 86400  IN      NS      geons02.barracudacentral.org.
bb.barracudacentral.org. 86400  IN      NS      geons01.barracudacentral.org.
;; Received 207 bytes from 69.36.145.33#53(a3.verisigndns.com) in 45 ms

;; expected opt record in response
;; Received 56 bytes from
64.235.150.189#53(geons01.barracudacentral.org) in 31 ms
Reply | Threaded
Open this post in threaded view
|

Re: dnsblog and "Name service error"

Viktor Dukhovni
In reply to this post by Alex Regan


> On Sep 13, 2018, at 9:39 PM, Alex <[hidden email]> wrote:
>
> No, I should have mentioned that it doesn't always fail, and mailspike
> (in this case) doesn't have a quota. It also fails for my registered
> barracuda and other RBLs we pay for.
>
> It failed about a thousand times over the course of the day, while at
> least tens of thousands of queries with the same config have
> succeeded.

Some nameservers have rate limits and drop packets when you go too fast.
In my DANE survey I was seeing dropped queries and long delays with a
couple of hosting providers, with total throughput down to ~1200 qps
(which really meant ~240 domains per second as the survey does multiple
queries for many domains).  I asked the two providers to raise the
limits, which they did, and now I'm doing ~2700 qps or 540 domains a
second.

In any case, this is really NOT a Postfix question, your local
resolver is returning SERVFAIL, debug your resolver, not Postfix.

For high query rates, perhaps the resolver needs more cache RAM,
a larger pool of open sockets for outgoing queries, ...  Or, more
likely, the problem is upstream.

--
        Viktor.

Reply | Threaded
Open this post in threaded view
|

Re: dnsblog and "Name service error"

Viktor Dukhovni
In reply to this post by Alex Regan


> On Sep 13, 2018, at 9:46 PM, Alex <[hidden email]> wrote:
>
> Is it common practice for these RBLs to just block instead of
> responding with an actual error?

With authoritative servers over UDP, there's no blocking, they
can just drop your query and not reply.  They're not blocking,
just your resolver is.

With random query drops some queries resolve, others don't.

--
        Viktor.

Reply | Threaded
Open this post in threaded view
|

Re: dnsblog and "Name service error"

John Fawcett
In reply to this post by Alex Regan
On 14/09/18 03:39, Alex wrote:

> Hi,
>
> On Thu, Sep 13, 2018 at 7:56 PM Wietse Venema <[hidden email]> wrote:
>> Alex:
>>> Domain Name System (response)
>>>     Transaction ID: 0xded6
>>>     Flags: 0x8182 Standard query response, Server failure
>>>         1... .... .... .... = Response: Message is a response
>>>         .000 0... .... .... = Opcode: Standard query (0)
>>>         .... .0.. .... .... = Authoritative: Server is not an
>>> authority for domain
>>>         .... ..0. .... .... = Truncated: Message is not truncated
>>>         .... ...1 .... .... = Recursion desired: Do query recursively
>>>         .... .... 1... .... = Recursion available: Server can do
>>> recursive queries
>>>         .... .... .0.. .... = Z: reserved (0)
>>>         .... .... ..0. .... = Answer authenticated: Answer/authority
>>> portion was not authenticated by the server
>>>         .... .... ...0 .... = Non-authenticated data: Unacceptable
>>>         .... .... .... 0010 = Reply code: Server failure (2)
>> The REMOTE DNS server returns an error. Ask THEM why they do that.
>> Maybe you're exceeding the request quota; if so, pay them more.
> No, I should have mentioned that it doesn't always fail, and mailspike
> (in this case) doesn't have a quota. It also fails for my registered
> barracuda and other RBLs we pay for.
>
> It failed about a thousand times over the course of the day, while at
> least tens of thousands of queries with the same config have
> succeeded.
>
> Other ideas greatly appreciated.
>
In another email you mentioned that you have a business class cable modem.
In the past I have seen intermittent dns failures caused by security
features
that block udp and tcp floods. If the modem (or any router in between the
modem and your server) has such a feature and the configured values
of allowed packet rate are set too low there will be sporadic dropped
dns answer packets during times of peak dns queries.

Other than that, given that it is your local resolver that is providing the
failure response you might look into potential debugging and logging
features on that software.

John