lost connection error, need help debugging

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

lost connection error, need help debugging

Alex Regan
Hi,

I'm trying to figure out why the remote server is responding with a
"lost connection" error without any further information to indicate
why the message was deferred.

If I use telnet and replicate the connection process, I can send a
test message. However, messages sent from remote users and forwarded
through our server via a .forward file to this remote user's account
are deferred with a simple "lost connection" response.

How can I get further information as to what the cause may be? Does
the below debug help? It appears that it's being deferred immediately
after the MAIL FROM, as this debug seems to indicate?

Could the TLS error below be the cause? I believe that is a separate
issue I have to address, and otherwise seems fine with other servers.

smtp_stream_setup: maxtime=300 enable_deadline=0
> mail.sagebiz.com[66.252.104.194]:25: EHLO email.example.com
vstream_fflush_some: fd 15 flush 39
vstream_buf_get_ready: fd 15 got 201
< mail.sagebiz.com[66.252.104.194]:25: 250-sagebiz.com
< mail.sagebiz.com[66.252.104.194]:25: 250-SIZE 30000000
< mail.sagebiz.com[66.252.104.194]:25: 250-ETRN
< mail.sagebiz.com[66.252.104.194]:25: 250-ENHANCEDSTATUSCODES
< mail.sagebiz.com[66.252.104.194]:25: 250-X-IMS 3 63253
< mail.sagebiz.com[66.252.104.194]:25: 250-DSN
< mail.sagebiz.com[66.252.104.194]:25: 250-VRFY
< mail.sagebiz.com[66.252.104.194]:25: 250-AUTH LOGIN NTLM SCRAM-MD5 CRAM-MD5
< mail.sagebiz.com[66.252.104.194]:25: 250-AUTH=LOGIN
< mail.sagebiz.com[66.252.104.194]:25: 250-X-AVU 1385496485
< mail.sagebiz.com[66.252.104.194]:25: 250 8BITMIME
server features: 0x900b size 30000000
smtp_stream_setup: maxtime=300 enable_deadline=0
> mail.sagebiz.com[66.252.104.194]:25: MAIL FROM:<[hidden email]> SIZE=47452
smtp_stream_setup: maxtime=300 enable_deadline=0 vstream_fflush_some:
fd 15 flush 57
warning: TLS library problem: 16575:error:1408F10B:SSL
routines:SSL3_GET_RECORD:wrong version number:s3_pkt.c:337: smtp_get:
EOF
connect to subsystem private/defer
send attr nrequest = 0
send attr flags = 0
send attr queue_id = 1A8AB4055E
send attr original_recipient = [hidden email]
send attr recipient = [hidden email]
send attr offset = 430
send attr dsn_orig_rcpt = rfc822;[hidden email]
send attr notify_flags = 0
send attr status = 4.4.2
send attr diag_type =
send attr diag_text =
send attr mta_type =
send attr mta_mname =
send attr action = delayed
send attr reason = lost connection with
mail.sagebiz.com[66.252.104.194] while sending MAIL FROM
vstream_fflush_some: fd 16 flush 356
vstream_buf_get_ready: fd 16 got 10

I've also included the successful telnet test:

$ telnet mail.sagebiz.com 25
Trying 66.252.104.194...
Connected to mail.sagebiz.com.
Escape character is '^]'.
220 mail.sagebiz.com MailSite ESMTP Receiver Version 9.5.4.12 Ready
ehlo mail.example.com
250-sagebiz.com
250-SIZE 30000000
250-ETRN
250-ENHANCEDSTATUSCODES
250-X-IMS 3 63253
250-DSN
250-VRFY
250-AUTH LOGIN NTLM SCRAM-MD5 CRAM-MD5
250-AUTH=LOGIN
250-X-AVU 1385496485
250-STARTTLS
250 8BITMIME
mail from:<[hidden email]>
250 2.0.0 <[hidden email]> OK
rcpt to:[hidden email]
250 2.0.0 <[hidden email]> OK
quit
221 2.0.0 mail.sagebiz.com closing
Connection closed by foreign host.

Thanks,
Alex
Reply | Threaded
Open this post in threaded view
|

Re: lost connection error, need help debugging

Wietse Venema
Buried under useless verbose logging is a clear warning:

> warning: TLS library problem: 16575:error:1408F10B:SSL
> routines:SSL3_GET_RECORD:wrong version number:s3_pkt.c:337: smtp_get:

This means that the TLS library had a problem.

> I've also included the successful telnet test:

telnet is not valid, since you are using TLS.

To debug SMTP over TLS, use "openssl s_client".

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: lost connection error, need help debugging

Viktor Dukhovni
On Tue, Nov 26, 2013 at 05:53:05PM -0500, Wietse Venema wrote:

> Buried under useless verbose logging is a clear warning:
>
> > warning: TLS library problem: 16575:error:1408F10B:SSL
> > routines:SSL3_GET_RECORD:wrong version number:s3_pkt.c:337: smtp_get:
>
> This means that the TLS library had a problem.

Plus the server is an Microsoft Exchange server, and the problem
happens on the first command after the post STARTLS EHLO.

> > I've also included the successful telnet test:
>
> telnet is not valid, since you are using TLS.
>
> To debug SMTP over TLS, use "openssl s_client".

No need.  This is the problem with Exchange on Windows 2003, and
the broken DES-CBC3-SHA ciphersuite.  Work-around in the list
archives.

--
        Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: lost connection error, need help debugging

Viktor Dukhovni
On Tue, Nov 26, 2013 at 11:05:48PM +0000, Viktor Dukhovni wrote:

> > To debug SMTP over TLS, use "openssl s_client".
>
> No need.  This is the problem with Exchange on Windows 2003, and
> the broken DES-CBC3-SHA ciphersuite.  Work-around in the list
> archives.

    $ posttls-finger -c -lmay -Lsummary -o tls_medium_cipherlist=DES-CBC3-SHA "[66.252.104.194]"
    posttls-finger: Connected to 66.252.104.194[66.252.104.194]:25
    posttls-finger: Untrusted TLS connection established to 66.252.104.194[66.252.104.194]:25: unknown with cipher DES-CBC3-SHA (168/168 bits)
    posttls-finger: warning: TLS library problem: 1748:error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number:/home/builds/ab/HEAD/src/crypto/external/bsd/openssl/dist/ssl/s3_pkt.c:339:
    posttls-finger: warning: lost connection while sending QUIT command

Similar problem will happen any time OpenSSL fails to send either
RC4-SHA or RC4-MD5 as the first 64 cipher-suites offered by the
client.  This is the default with OpenSSL 1.0.1, since additional
ciphers with TLSv1.2 push RC4 further down the list.

Web browsers apparently perform a fallback to SSLv3 (a built-in
downgrade attack if you like), when TLS handshakes fail.

Postfix falls back to plain-text when STARTTLS or the SSL handshake
fails, but here, the failure is triggered by garbage after the
encrypted EHLO response, which breaks the SSL records containing
MAIL FROM:.  We don't fallback to plaintext after the mail transaction
begins.

Perhaps the simplest work-around is to disable 3DES.  Generally,
servers other than Microsoft Exhange 2003 support AES.  And with
Microsoft Exchage 2003, disabling 3DES means that either we get
RC4 (and succeed) or get no common ciphers and fail early (during
the handshake), and thus fallback to plaintext.

So we could set a default value of "smtp_tls_exclude_ciphers = 3DES".

This won't solve the problem for people who configure explicit
"encrypt" or "secure" policy with such servers as targets, but they
are already doing a manual setup and can easily implement the more
complex work-around from the list archive.

--
        Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: lost connection error, need help debugging

Viktor Dukhovni
In reply to this post by Viktor Dukhovni
On Tue, Nov 26, 2013 at 11:05:48PM +0000, Viktor Dukhovni wrote:

> > This means that the TLS library had a problem.
>
> Plus the server is an Microsoft Exchange server, and the problem
> happens on the first command after the post STARTLS EHLO.

One last comment, the mail server in question does run on Windows,
but it is not Microsoft Exchange, rather it is:

    220 mail.sagebiz.com MailSite ESMTP Receiver Version 9.5.4.12 Ready

The underlying issue with CBC padding is therefore not Exchange-specific,
it is either in Windows 2003 SSPI, or in some library on top of
SSPI shared by MailSite and Exchange.

With RC4-SHA and RC4-MD5 the ciphertext length exceeds the plaintext
length by a fixed number of bytes.  With DES-CBC3-SHA the ciphertext
length exceeds the plaintext length by a variable number of bytes,
but both Exchange and MailSite send the packets whose length is
plaintext + maximum possible overhead, thus emitting random trailing
data from the stack or heap after the first application data record.

The servers in question should be removed from active Internet-facing
duty.  Their software stack is too ancient.

--
        Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: lost connection error, need help debugging

Alex Regan
In reply to this post by Viktor Dukhovni
Hi,

>     $ posttls-finger -c -lmay -Lsummary -o tls_medium_cipherlist=DES-CBC3-SHA "[66.252.104.194]"
>     posttls-finger: Connected to 66.252.104.194[66.252.104.194]:25
>     posttls-finger: Untrusted TLS connection established to 66.252.104.194[66.252.104.194]:25: unknown with cipher DES-CBC3-SHA (168/168 bits)
>     posttls-finger: warning: TLS library problem: 1748:error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number:/home/builds/ab/HEAD/src/crypto/external/bsd/openssl/dist/ssl/s3_pkt.c:339:
>     posttls-finger: warning: lost connection while sending QUIT command

I've just downloaded this and compiled it on my system, but it says
invalid options:

# posttls-finger -c -lmay -Lsummary -o
tls_medium_cipherlist=DES-CBC3-SHA "[66.252.104.194]"
posttls-finger: invalid option -- 'l'

The -L is also not available:
# posttls-finger
usage: posttls-finger [-acStTv] [-h host_lookup] [-o name=value] destination

> Postfix falls back to plain-text when STARTTLS or the SSL handshake
> fails, but here, the failure is triggered by garbage after the
> encrypted EHLO response, which breaks the SSL records containing
> MAIL FROM:.  We don't fallback to plaintext after the mail transaction
> begins.

Just to be sure I understand, you're saying that because 3DES had
begun then failed, the connection is just closed, correct?

> Perhaps the simplest work-around is to disable 3DES.  Generally,
> servers other than Microsoft Exhange 2003 support AES.  And with
> Microsoft Exchage 2003, disabling 3DES means that either we get
> RC4 (and succeed) or get no common ciphers and fail early (during
> the handshake), and thus fallback to plaintext.

I've now done this, and it worked.

I looked at my debug trace of the messages delivered successfully, and
it didn't indicate what cipher was used. Is there a specific debug
option available to determine this for the next time?

> So we could set a default value of "smtp_tls_exclude_ciphers = 3DES".

Is it possible to disable it just for this peer? Or is it okay to
disable 3DES permanently system-wide?

Thank you for all that you do.
Alex
Reply | Threaded
Open this post in threaded view
|

Re: lost connection error, need help debugging

Viktor Dukhovni
On Tue, Nov 26, 2013 at 08:53:32PM -0500, Alex wrote:

> >     posttls-finger: warning: lost connection while sending QUIT command
>
> I've just downloaded this and compiled it on my system, but it says
> invalid options:

You have to compile *with* TLS support enabled.

    make -f Makefile.init CCARGS='-DUSE_TLS' AUXLIBS='-lssl -lcrypto'

> Just to be sure I understand, you're saying that because 3DES had
> begun then failed, the connection is just closed, correct?

Yes.

> I've now done this, and it worked.

Good.  This was expected, but unexpected things can also happen.

> I looked at my debug trace of the messages delivered successfully, and
> it didn't indicate what cipher was used. Is there a specific debug
> option available to determine this for the next time?

With 3DES disabled, no cipher is negotiated, the TLS handshake
fails, and Postfix delivers the message in the clear.

> > So we could set a default value of "smtp_tls_exclude_ciphers = 3DES".
>
> Is it possible to disable it just for this peer? Or is it okay to
> disable 3DES permanently system-wide?

Yes, you can play whack-a-mole disabling it one server at a time,
but I would suggest disabling it globally.

--
        Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: lost connection error, need help debugging

Alex Regan
Hi,

> You have to compile *with* TLS support enabled.
>
>     make -f Makefile.init CCARGS='-DUSE_TLS' AUXLIBS='-lssl -lcrypto'

Okay, got it to work now. Apparently it wasn't included with my fedora
postfix install.

>> I looked at my debug trace of the messages delivered successfully, and
>> it didn't indicate what cipher was used. Is there a specific debug
>> option available to determine this for the next time?
>
> With 3DES disabled, no cipher is negotiated, the TLS handshake
> fails, and Postfix delivers the message in the clear.

Just to be sure, you mean TLS is now disabled only to these defective
servers because of the faulty 3DES implementation, correct?

>> Is it possible to disable it just for this peer? Or is it okay to
>> disable 3DES permanently system-wide?
>
> Yes, you can play whack-a-mole disabling it one server at a time,
> but I would suggest disabling it globally.

So it will now most likely use RC4 as the next cipher, correct?

Thanks,
Alex
Reply | Threaded
Open this post in threaded view
|

Re: lost connection error, need help debugging

Viktor Dukhovni
On Tue, Nov 26, 2013 at 09:37:13PM -0500, Alex wrote:

> > You have to compile *with* TLS support enabled.
> >
> >     make -f Makefile.init CCARGS='-DUSE_TLS' AUXLIBS='-lssl -lcrypto'
>
> Okay, got it to work now. Apparently it wasn't included with my fedora
> postfix install.

Not surprising, posttls-finger(1) is only available with Postfix
2.11 snapshots.  And so far, Wietse is not planning to add this
utility to the list of command utilities that are installed by
default.  So to use it, you have to build it from source like you
did.

> > With 3DES disabled, no cipher is negotiated, the TLS handshake
> > fails, and Postfix delivers the message in the clear.
>
> Just to be sure, you mean TLS is now disabled only to these defective
> servers because of the faulty 3DES implementation, correct?

Yes, just to the defective servers.

> > Yes, you can play whack-a-mole disabling it one server at a time,
> > but I would suggest disabling it globally.
>
> So it will now most likely use RC4 as the next cipher, correct?

No, TLS will fail to the defective servers, but this will be during
the handshake, so Postfix will fallback to plaintext.  If you must
encrypt traffic to these servers, you need per-destination policy.
Search the archives for details posted in the last month or so.

--
        Vikor.
Reply | Threaded
Open this post in threaded view
|

Re: lost connection error, need help debugging

Alex Regan
In reply to this post by Viktor Dukhovni
Hi Viktor,

On Tue, Nov 26, 2013 at 6:05 PM, Viktor Dukhovni
<[hidden email]> wrote:

> On Tue, Nov 26, 2013 at 05:53:05PM -0500, Wietse Venema wrote:
>
>> Buried under useless verbose logging is a clear warning:
>>
>> > warning: TLS library problem: 16575:error:1408F10B:SSL
>> > routines:SSL3_GET_RECORD:wrong version number:s3_pkt.c:337: smtp_get:
>>
>> This means that the TLS library had a problem.
>
> Plus the server is an Microsoft Exchange server, and the problem
> happens on the first command after the post STARTLS EHLO.
>
>> > I've also included the successful telnet test:
>>
>> telnet is not valid, since you are using TLS.
>>
>> To debug SMTP over TLS, use "openssl s_client".
>
> No need.  This is the problem with Exchange on Windows 2003, and
> the broken DES-CBC3-SHA ciphersuite.  Work-around in the list
> archives.

I believe I've found your post in the archives from just a few weeks
ago that describes this a bit further, but it doesn't describe where
you got the info from, so that I may understand this further.

Do you know where I can find more info about this? Perhaps there's a
MS tech bulletin or something that I can forward to the ISP?

Thanks,
Alex
Reply | Threaded
Open this post in threaded view
|

Re: lost connection error, need help debugging

Viktor Dukhovni
On Mon, Dec 02, 2013 at 12:23:54PM -0500, Alex wrote:

> > No need.  This is the problem with Exchange on Windows 2003, and
> > the broken DES-CBC3-SHA ciphersuite.  Work-around in the list
> > archives.
>
> I believe I've found your post in the archives from just a few weeks
> ago that describes this a bit further, but it doesn't describe where
> you got the info from, so that I may understand this further.
>
> Do you know where I can find more info about this? Perhaps there's a
> MS tech bulletin or something that I can forward to the ISP?

I am not aware of any definitive Microsoft technical articles
covering this issue.  My posts on the subject to this list are
based information I discovered for myself.  My report is sufficiently
authoritative to stand on its own.

A quick Google search uncovers the following, which is either the
same issue or a related issue:

    http://support.microsoft.com/kb/938857

the description is rather poor (surely wrong, written by some poor
sod who is mis-reporting it second hand):

    Block ciphers algorithms are unusual because they change the
    size of the data that is encrypted. When the encrypted data is
    returned, the size of the data may be smaller than the size of
    the data that was sent to be encrypted.  In other words, the
    size of the encrypted data that the Exchange 2003 server sends
    back to the client is different by several bytes. For example,
    a program uses an SSL connection to send 1,000 bytes of data
    to be encrypted.  When the data is encrypted and then returned
    to the client, the size of the data is 980 bytes. This can
    remove the client's ability to decrypt the encrypted data.

Back on planet Earth, block encryption algorithms add a variable
amount of padding, but they never shrink the payload.  Mix in
sufficient skepticism about the expertise of the author and the
core issue is the same ("several bytes" of CBC padding mishandled
by Exchange 2003).

--
        Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: lost connection error, need help debugging

Viktor Dukhovni
On Mon, Dec 02, 2013 at 06:07:07PM +0000, Viktor Dukhovni wrote:

> A quick Google search uncovers the following, which is either the
> same issue or a related issue:
>
>     http://support.microsoft.com/kb/938857

Ditto:

    http://archives.neohapsis.com/archives/postfix/2007-12/0086.html

--
        Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: lost connection error, need help debugging

Alex Regan
In reply to this post by Viktor Dukhovni
Hi Viktor,

On Mon, Dec 2, 2013 at 1:07 PM, Viktor Dukhovni
<[hidden email]> wrote:

> On Mon, Dec 02, 2013 at 12:23:54PM -0500, Alex wrote:
>
>> > No need.  This is the problem with Exchange on Windows 2003, and
>> > the broken DES-CBC3-SHA ciphersuite.  Work-around in the list
>> > archives.
>>
>> I believe I've found your post in the archives from just a few weeks
>> ago that describes this a bit further, but it doesn't describe where
>> you got the info from, so that I may understand this further.
>>
>> Do you know where I can find more info about this? Perhaps there's a
>> MS tech bulletin or something that I can forward to the ISP?
>
> I am not aware of any definitive Microsoft technical articles
> covering this issue.  My posts on the subject to this list are
> based information I discovered for myself.  My report is sufficiently
> authoritative to stand on its own.

Yes, no doubt. I was just hoping for a simple link to forward on.

> A quick Google search uncovers the following, which is either the
> same issue or a related issue:
>
>     http://support.microsoft.com/kb/938857

Not sure how I ever missed that. Thanks so much, and thanks for the
correction on its content, too!

Best,
Alex