*****SPAM***** Occasional transient "Insufficient system storage" errors

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

*****SPAM***** Occasional transient "Insufficient system storage" errors

Dara Poon
Spam detection software, running on the system "smtp2.ece.ubc.ca",
has identified this incoming email as possible spam.  The original
message has been attached to this so you can view it or label
similar future email.  If you have any questions, see
[hidden email] for details.

Content preview:  I'm seeing very occasional "Insufficient system storage" errors
   in my /var/log/mail.log, but it's not the usual lack of disk space. Here
  is an excerpt from /var/log/mail.log: Oct 14 20:07:10 smtp2 postfix/smtpd[473668]:
   connect from m144-mta1-lv.wordfly.com[52.124.61.163] Oct 14 20:07:13 smtp2
   postfix/smtpd[473668]: 01F6F1FE18: client=m144-mta1-lv.wordfly.com[52.124.61.163
   [...]

Content analysis details:   (6.2 points, 6.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
 2.0 PDS_OTHER_BAD_TLD      Untrustworthy TLDs
                            [URI: galasone.top (top)]
-0.1 DKIM_VALID_AU          Message has a valid DKIM or DK signature from
                            author's domain
 0.1 DKIM_SIGNED            Message has a DKIM or DK signature, not necessarily
                            valid
-0.1 DKIM_VALID             Message has at least one valid DKIM or DK signature
 2.5 URIBL_DBL_SPAM         Contains a spam URL listed in the Spamhaus DBL
                            blocklist
                            [URIs: galasone.top]
 0.1 URIBL_CSS_A            Contains URL's A record listed in the Spamhaus CSS
                            blocklist
                            [URIs: pmta205.journal-oncology.info]
 1.7 URIBL_BLACK            Contains an URL listed in the URIBL blacklist
                            [URIs: journal-oncology.info]
 0.0 UNPARSEABLE_RELAY      Informational: message has unparseable relay
                            lines



I'm seeing very occasional "Insufficient system storage" errors in my /var/log/mail.log, but it's not the usual lack of disk space.  Here is an excerpt from /var/log/mail.log:

Oct 14 20:07:10 smtp2 postfix/smtpd[473668]: connect from m144-mta1-lv.wordfly.com[52.124.61.163]
Oct 14 20:07:13 smtp2 postfix/smtpd[473668]: 01F6F1FE18: client=m144-mta1-lv.wordfly.com[52.124.61.163]
Oct 14 20:07:13 smtp2 opendkim[28076]: 01F6F1FE18: s=wordfly02 d=wordfly.com a=rsa-sha256 SSL
Oct 14 20:07:16 smtp2 postfix/cleanup[473260]: 45CF420090: message-id=<[hidden email]>
Oct 14 20:07:16 smtp2 opendkim[28076]: 45CF420090: s=s1024-2013-q3 d=facebookmail.com a=rsa-sha256 SSL
Oct 14 20:07:25 smtp2 postfix/smtpd[473668]: disconnect from m144-mta1-lv.wordfly.com[52.124.61.163] ehlo=2 starttls=1 mail=1 rcpt=1 data=1 quit=1 commands=7
Oct 14 20:07:26 smtp2 postfix/local[474024]: 45CF420090: to=<[hidden email]>, relay=local, delay=10, delays=10/0/0/0.01, dsn=2.0.0, status=sent (forwarded as 541A820093)
Oct 14 20:07:30 smtp2 postfix/smtpd[473637]: connect from pmta205.journal-oncology.info[191.101.211.207]
Oct 14 20:07:30 smtp2 postfix/smtpd[473668]: connect from m144-mta1-lv.wordfly.com[52.124.61.163]
Oct 14 20:07:30 smtp2 postfix/smtpd[473668]: NOQUEUE: reject: MAIL from m144-mta1-lv.wordfly.com[52.124.61.163]: 452 4.3.1 Insufficient system storage; proto=ESMTP helo=<m144-mta1-lv.wordfly.com>
Oct 14 20:07:30 smtp2 postfix/smtpd[473668]: warning: not enough free space in mail queue: 0 bytes < 1.5*message size limit
Oct 14 20:07:30 smtp2 postfix/smtpd[473637]: NOQUEUE: reject: MAIL from pmta205.journal-oncology.info[191.101.211.207]: 452 4.3.1 Insufficient system storage; proto=ESMTP helo=<pmta205.journal-oncology.info>
Oct 14 20:07:30 smtp2 postfix/smtpd[473637]: warning: not enough free space in mail queue: 0 bytes < 1.5*message size limit
Oct 14 20:07:30 smtp2 postfix/cleanup[473260]: 9F3F120331: message-id=<[hidden email]>
Oct 14 20:07:30 smtp2 postfix/cleanup[473260]: warning: 9F3F120331: write queue file: No space left on device
Oct 14 20:07:30 smtp2 postfix/smtpd[473637]: disconnect from pmta205.journal-oncology.info[191.101.211.207] ehlo=2 starttls=1 mail=0/1 quit=1 commands=4/5
Oct 14 20:07:35 smtp2 postfix/qmgr[407832]: 501A91FB53: from=<[hidden email]>, size=1302, nrcpt=1 (queue active)
Oct 14 20:07:35 smtp2 postfix/smtpd[473668]: disconnect from m144-mta1-lv.wordfly.com[52.124.61.163] ehlo=2 starttls=1 mail=0/1 quit=1 commands=4/5
Oct 14 20:07:36 smtp2 postfix/smtp[474095]: 541A820093: to=<[hidden email]>, orig_to=<[hidden email]>, relay=none, delay=10, delays=0/0.02/10/0, dsn=4.4.3, status=deferred (Host or domain name not found. Name service error for name=gmail.com type=MX: Host not found, try again)
Oct 14 20:07:45 smtp2 postfix/local[473993]: 01F6F1FE18: to=<[hidden email]>, relay=local, delay=36, delays=9.8/0/0/26, dsn=2.0.0, status=sent (delivered to command: /usr/bin/procmail -p)
Oct 14 20:07:45 smtp2 postfix/smtpd[473637]: connect from unknown[45.193.210.121]
Oct 14 20:07:46 smtp2 postfix/smtpd[473637]: NOQUEUE: reject: RCPT from unknown[45.193.210.121]: 450 4.7.25 Client host rejected: cannot find your hostname, [45.193.210.121]; from=<[hidden email]> to=<[hidden email]> proto=ESMTP helo=<galasone.top>
Oct 14 20:07:46 smtp2 postfix/smtpd[473637]: disconnect from unknown[45.193.210.121] ehlo=1 mail=1 rcpt=0/1 data=0/1 quit=1 commands=3/5


Observations
============

0. Postfix 3.4.13 on Ubuntu 20.04 (Linux kernel 5.4.0-48-generic) in a vSphere virtual machine.  Also previously observed on Ubuntu 18.04, but not 16.04.

1. This happens very infrequently, but seemingly randomly.  For a rough sense of scale, I count 17500 instances of "delivered" in mail.log, but only 16 instances of "Insufficient system storage" in the same mail.log.

2. When it happens, it may happen to more than one process at around the same time.  The excerpt above shows two such complaints from two different smtpd processes, which is too tight of a cluster to be a coincidence, given the rarity.  The instance before that was at Oct 14 14:50:30.  And it has been fine for at least two hours since then.

3. Sometimes, these errors result in a double-bounce notification to postmaster.  Here is what the double-bounce notification generated at 20:07:35 says:

===== BEGIN double-bounce BODY 1 =====
Transcript of session follows.

Out: 220 mx.ece.ubc.ca ESMTP Postfix
In:  EHLO m144-mta1-lv.wordfly.com
Out: 250-smtp2.ece.ubc.ca
Out: 250-PIPELINING
Out: 250-SIZE 52428800
Out: 250-ETRN
Out: 250-STARTTLS
Out: 250-ENHANCEDSTATUSCODES
Out: 250-8BITMIME
Out: 250-DSN
Out: 250-SMTPUTF8
Out: 250 CHUNKING
In:  STARTTLS
Out: 220 2.0.0 Ready to start TLS
In:  EHLO m144-mta1-lv.wordfly.com
Out: 250-smtp2.ece.ubc.ca
Out: 250-PIPELINING
Out: 250-SIZE 52428800
Out: 250-ETRN
Out: 250-AUTH PLAIN LOGIN
Out: 250-AUTH=PLAIN LOGIN
Out: 250-ENHANCEDSTATUSCODES
Out: 250-8BITMIME
Out: 250-DSN
Out: 250-SMTPUTF8
Out: 250 CHUNKING
In:  MAIL FROM:<[hidden email]>
Out: 452 4.3.1 Insufficient system storage
In:  QUIT
Out: 221 2.0.0 Bye


For other details, see the local mail logfile
===== END double-bounce BODY 1 =====

But it can also happen at other phases of the SMTP transaction:

===== BEGIN double-bounce BODY 2 =====
Transcript of session follows.

Out: 220 mx.ece.ubc.ca ESMTP Postfix
In:  HELO lampe16.ece.ubc.ca
Out: 250 smtp2.ece.ubc.ca
In:  MAIL FROM: <[hidden email]>
Out: 452 4.3.1 Insufficient system storage

Session aborted, reason: lost connection

For other details, see the local mail logfile
===== BEGIN double-bounce BODY 2 =====


What I have tried
=================

root@smtp2:~# df -h /var
Filesystem          Size  Used Avail Use% Mounted on
/dev/mapper/vg-var  7.8G  4.7G  2.7G  64% /var
root@smtp2:~# df -i /var
Filesystem         Inodes IUsed  IFree IUse% Mounted on
/dev/mapper/vg-var 518144  7638 510506    2% /var
root@smtp2:~# postconf message_size_limit queue_minfree
message_size_limit = 52428800
queue_minfree = 0

I've turned on debug_peer_list, and managed to get some verbose logging for the lampe16.ece.ubc.ca example above:

Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: ...[SNIP]...
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: milter8_mail_event: milter unix:/clamav/clamav-milter.ctl: mail <[hidden email]>
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: event: SMFIC_MAIL; macros: {mail_addr}=[hidden email] {mail_host}=smtp2.ece.ubc.ca {mail_mailer}=local
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: reply: SMFIR_CONTINUE data 0 bytes
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: smtpd_check_rewrite: trying: permit_inet_interfaces
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: permit_inet_interfaces: lampe16.ece.ubc.ca 137.82.57.92
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: fsspace: .: block size 4096, blocks free 0
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: smtpd_check_queue: blocks 4096 avail 0 min_free 0 msg_size_limit 52428800
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: NOQUEUE: reject: MAIL from lampe16.ece.ubc.ca[137.82.57.92]: 452 4.3.1 Insufficient system storage; proto=SMTP helo=<lampe16.ece.ubc.ca>
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: warning: not enough free space in mail queue: 0 bytes < 1.5*message size limit
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: abort all milters
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: milter8_abort: abort milter inet:localhost:8892
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: milter8_abort: abort milter unix:/spamass/spamass.sock
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: milter8_abort: abort milter unix:/clamav/clamav-milter.ctl
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: > lampe16.ece.ubc.ca[137.82.57.92]: 452 4.3.1 Insufficient system storage
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: watchdog_pat: 0x56190ce2a700
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: smtp_get: EOF
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: ...[SNIP]...


The error message is coming from smtpd_check_queue() in src/smtpd/smtpd_check.c.  The disk space check uses fsspace() in src/util/fsspace.c, which on Linux calls statfs(2).  The code confirms that when it complains of insufficient system storage, it means disk blocks, not inodes, memory, shared memory, or anything else.


Which filesystem is it testing when it does fsspace(".", &fsbuf)?  If I look at a typical smtpd process, /proc/$pid/cwd is a symlink to /var/spool/postfix, as I would expect, so it should be testing my /var filesystem.  (I don't have separate filesystems under /var such as /var/log, /var/spool, or anything.)

Perhaps smtpd is doing a sneaky momentary chdir(2) to the wrong place before calling fsspace()?  I grepped the entire Postfix source code, and it's always either chdir(var_queue_dir), or chdir(var_daemon_dir), or chdir(var_command_dir), or chdir("/") in a chrooted context — all of which should be in the same /var filesystem.  The exception is chdir(args.cwd) in src/global/pipe_command.c, but surely that's irrelevant?  Anyway, as I noted, it affects multiple processes simultaneously.


Perhaps there is really something that is momentarily putting huge files in my /var and immediately deleting them?  I monitored that too:

root@smtp2:~# tail -f /var/log/mail.log | while read -r line ; do case "$line" in *Insufficient*) date ; df --block-size=4K /var ; esac ; done
Wed 14 Oct 2020 08:07:30 PM PDT
Filesystem         4K-blocks    Used Available Use% Mounted on
/dev/mapper/vg-var   2020590 1191809    721210  63% /var
Wed 14 Oct 2020 08:07:30 PM PDT
Filesystem         4K-blocks    Used Available Use% Mounted on
/dev/mapper/vg-var   2020590 1191809    721210  63% /var


There's nothing interesting in /var/log/messages during the times that this happens.

Whenever it happens, it's always complaining that there are ZERO bytes free, not some small number of bytes free.  Besides, my /var/log/mail.log is on the same filesystem, and as far as I can tell, my syslog is working fine.

So, at this point, I'm stumped.  Any ideas?

Dara Poon
IT Services, Department of Electrical and Computer Engineering, University of British Columbia

Reply | Threaded
Open this post in threaded view
|

Occasional transient "Insufficient system storage" errors

Dara Poon
(Well, that was embarrassing!  I had a Spamassassin milter on outbound mail that tagged my own message as a false positive.  Sending it again for readability.  Sorry!)



I'm seeing very occasional "Insufficient system storage" errors in my /var/log/mail.log, but it's not the usual lack of disk space.  Here is an excerpt from /var/log/mail.log:

Oct 14 20:07:10 smtp2 postfix/smtpd[473668]: connect from m144-mta1-lv.wordfly.com[52.124.61.163]
Oct 14 20:07:13 smtp2 postfix/smtpd[473668]: 01F6F1FE18: client=m144-mta1-lv.wordfly.com[52.124.61.163]
Oct 14 20:07:13 smtp2 opendkim[28076]: 01F6F1FE18: s=wordfly02 d=wordfly.com a=rsa-sha256 SSL
Oct 14 20:07:16 smtp2 postfix/cleanup[473260]: 45CF420090: message-id=<[hidden email]>
Oct 14 20:07:16 smtp2 opendkim[28076]: 45CF420090: s=s1024-2013-q3 d=facebookmail.com a=rsa-sha256 SSL
Oct 14 20:07:25 smtp2 postfix/smtpd[473668]: disconnect from m144-mta1-lv.wordfly.com[52.124.61.163] ehlo=2 starttls=1 mail=1 rcpt=1 data=1 quit=1 commands=7
Oct 14 20:07:26 smtp2 postfix/local[474024]: 45CF420090: to=<[hidden email]>, relay=local, delay=10, delays=10/0/0/0.01, dsn=2.0.0, status=sent (forwarded as 541A820093)
Oct 14 20:07:30 smtp2 postfix/smtpd[473637]: connect from pmta205.journal-oncology.info[191.101.211.207]
Oct 14 20:07:30 smtp2 postfix/smtpd[473668]: connect from m144-mta1-lv.wordfly.com[52.124.61.163]
Oct 14 20:07:30 smtp2 postfix/smtpd[473668]: NOQUEUE: reject: MAIL from m144-mta1-lv.wordfly.com[52.124.61.163]: 452 4.3.1 Insufficient system storage; proto=ESMTP helo=<m144-mta1-lv.wordfly.com>
Oct 14 20:07:30 smtp2 postfix/smtpd[473668]: warning: not enough free space in mail queue: 0 bytes < 1.5*message size limit
Oct 14 20:07:30 smtp2 postfix/smtpd[473637]: NOQUEUE: reject: MAIL from pmta205.journal-oncology.info[191.101.211.207]: 452 4.3.1 Insufficient system storage; proto=ESMTP helo=<pmta205.journal-oncology.info>
Oct 14 20:07:30 smtp2 postfix/smtpd[473637]: warning: not enough free space in mail queue: 0 bytes < 1.5*message size limit
Oct 14 20:07:30 smtp2 postfix/cleanup[473260]: 9F3F120331: message-id=<[hidden email]>
Oct 14 20:07:30 smtp2 postfix/cleanup[473260]: warning: 9F3F120331: write queue file: No space left on device
Oct 14 20:07:30 smtp2 postfix/smtpd[473637]: disconnect from pmta205.journal-oncology.info[191.101.211.207] ehlo=2 starttls=1 mail=0/1 quit=1 commands=4/5
Oct 14 20:07:35 smtp2 postfix/qmgr[407832]: 501A91FB53: from=<[hidden email]>, size=1302, nrcpt=1 (queue active)
Oct 14 20:07:35 smtp2 postfix/smtpd[473668]: disconnect from m144-mta1-lv.wordfly.com[52.124.61.163] ehlo=2 starttls=1 mail=0/1 quit=1 commands=4/5
Oct 14 20:07:36 smtp2 postfix/smtp[474095]: 541A820093: to=<[hidden email]>, orig_to=<[hidden email]>, relay=none, delay=10, delays=0/0.02/10/0, dsn=4.4.3, status=deferred (Host or domain name not found. Name service error for name=gmail.com type=MX: Host not found, try again)
Oct 14 20:07:45 smtp2 postfix/local[473993]: 01F6F1FE18: to=<[hidden email]>, relay=local, delay=36, delays=9.8/0/0/26, dsn=2.0.0, status=sent (delivered to command: /usr/bin/procmail -p)
Oct 14 20:07:45 smtp2 postfix/smtpd[473637]: connect from unknown[45.193.210.121]
Oct 14 20:07:46 smtp2 postfix/smtpd[473637]: NOQUEUE: reject: RCPT from unknown[45.193.210.121]: 450 4.7.25 Client host rejected: cannot find your hostname, [45.193.210.121]; from=<[hidden email]> to=<[hidden email]> proto=ESMTP helo=<galasone.top>
Oct 14 20:07:46 smtp2 postfix/smtpd[473637]: disconnect from unknown[45.193.210.121] ehlo=1 mail=1 rcpt=0/1 data=0/1 quit=1 commands=3/5


Observations
============

0. Postfix 3.4.13 on Ubuntu 20.04 (Linux kernel 5.4.0-48-generic) in a vSphere virtual machine.  Also previously observed on Ubuntu 18.04, but not 16.04.

1. This happens very infrequently, but seemingly randomly.  For a rough sense of scale, I count 17500 instances of "delivered" in mail.log, but only 16 instances of "Insufficient system storage" in the same mail.log.

2. When it happens, it may happen to more than one process at around the same time.  The excerpt above shows two such complaints from two different smtpd processes, which is too tight of a cluster to be a coincidence, given the rarity.  The instance before that was at Oct 14 14:50:30.  And it has been fine for at least two hours since then.

3. Sometimes, these errors result in a double-bounce notification to postmaster.  Here is what the double-bounce notification generated at 20:07:35 says:

===== BEGIN double-bounce BODY 1 =====
Transcript of session follows.

Out: 220 mx.ece.ubc.ca ESMTP Postfix
In:  EHLO m144-mta1-lv.wordfly.com
Out: 250-smtp2.ece.ubc.ca
Out: 250-PIPELINING
Out: 250-SIZE 52428800
Out: 250-ETRN
Out: 250-STARTTLS
Out: 250-ENHANCEDSTATUSCODES
Out: 250-8BITMIME
Out: 250-DSN
Out: 250-SMTPUTF8
Out: 250 CHUNKING
In:  STARTTLS
Out: 220 2.0.0 Ready to start TLS
In:  EHLO m144-mta1-lv.wordfly.com
Out: 250-smtp2.ece.ubc.ca
Out: 250-PIPELINING
Out: 250-SIZE 52428800
Out: 250-ETRN
Out: 250-AUTH PLAIN LOGIN
Out: 250-AUTH=PLAIN LOGIN
Out: 250-ENHANCEDSTATUSCODES
Out: 250-8BITMIME
Out: 250-DSN
Out: 250-SMTPUTF8
Out: 250 CHUNKING
In:  MAIL FROM:<[hidden email]>
Out: 452 4.3.1 Insufficient system storage
In:  QUIT
Out: 221 2.0.0 Bye


For other details, see the local mail logfile
===== END double-bounce BODY 1 =====

But it can also happen at other phases of the SMTP transaction:

===== BEGIN double-bounce BODY 2 =====
Transcript of session follows.

Out: 220 mx.ece.ubc.ca ESMTP Postfix
In:  HELO lampe16.ece.ubc.ca
Out: 250 smtp2.ece.ubc.ca
In:  MAIL FROM: <[hidden email]>
Out: 452 4.3.1 Insufficient system storage

Session aborted, reason: lost connection

For other details, see the local mail logfile
===== BEGIN double-bounce BODY 2 =====


What I have tried
=================

root@smtp2:~# df -h /var
Filesystem          Size  Used Avail Use% Mounted on
/dev/mapper/vg-var  7.8G  4.7G  2.7G  64% /var
root@smtp2:~# df -i /var
Filesystem         Inodes IUsed  IFree IUse% Mounted on
/dev/mapper/vg-var 518144  7638 510506    2% /var
root@smtp2:~# postconf message_size_limit queue_minfree
message_size_limit = 52428800
queue_minfree = 0

I've turned on debug_peer_list, and managed to get some verbose logging for the lampe16.ece.ubc.ca example above:

Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: ...[SNIP]...
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: milter8_mail_event: milter unix:/clamav/clamav-milter.ctl: mail <[hidden email]>
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: event: SMFIC_MAIL; macros: {mail_addr}=[hidden email] {mail_host}=smtp2.ece.ubc.ca{mail_mailer}=local
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: reply: SMFIR_CONTINUE data 0 bytes
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: smtpd_check_rewrite: trying: permit_inet_interfaces
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: permit_inet_interfaces: lampe16.ece.ubc.ca 137.82.57.92
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: fsspace: .: block size 4096, blocks free 0
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: smtpd_check_queue: blocks 4096 avail 0 min_free 0 msg_size_limit 52428800
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: NOQUEUE: reject: MAIL from lampe16.ece.ubc.ca[137.82.57.92]: 452 4.3.1 Insufficient system storage; proto=SMTP helo=<lampe16.ece.ubc.ca>
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: warning: not enough free space in mail queue: 0 bytes < 1.5*message size limit
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: abort all milters
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: milter8_abort: abort milter inet:localhost:8892
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: milter8_abort: abort milter unix:/spamass/spamass.sock
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: milter8_abort: abort milter unix:/clamav/clamav-milter.ctl
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: > lampe16.ece.ubc.ca[137.82.57.92]: 452 4.3.1 Insufficient system storage
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: watchdog_pat: 0x56190ce2a700
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: smtp_get: EOF
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: ...[SNIP]...


The error message is coming from smtpd_check_queue() in src/smtpd/smtpd_check.c.  The disk space check uses fsspace() in src/util/fsspace.c, which on Linux calls statfs(2).  The code confirms that when it complains of insufficient system storage, it means disk blocks, not inodes, memory, shared memory, or anything else.


Which filesystem is it testing when it does fsspace(".", &fsbuf)?  If I look at a typical smtpd process, /proc/$pid/cwd is a symlink to /var/spool/postfix, as I would expect, so it should be testing my /var filesystem.  (I don't have separate filesystems under /var such as /var/log, /var/spool, or anything.)

Perhaps smtpd is doing a sneaky momentary chdir(2) to the wrong place before calling fsspace()?  I grepped the entire Postfix source code, and it's always either chdir(var_queue_dir), or chdir(var_daemon_dir), or chdir(var_command_dir), or chdir("/") in a chrooted context — all of which should be in the same /var filesystem.  The exception is chdir(args.cwd) in src/global/pipe_command.c, but surely that's irrelevant? Anyway, as I noted, it affects multiple processes simultaneously.


Perhaps there is really something that is momentarily putting huge files in my /var and immediately deleting them?  I monitored that too:

root@smtp2:~# tail -f /var/log/mail.log | while read -r line ; do case "$line" in *Insufficient*) date ; df --block-size=4K /var ; esac ; done
Wed 14 Oct 2020 08:07:30 PM PDT
Filesystem         4K-blocks    Used Available Use% Mounted on
/dev/mapper/vg-var   2020590 1191809    721210  63% /var
Wed 14 Oct 2020 08:07:30 PM PDT
Filesystem         4K-blocks    Used Available Use% Mounted on
/dev/mapper/vg-var   2020590 1191809    721210  63% /var


There's nothing interesting in /var/log/messages during the times that this happens.

Whenever it happens, it's always complaining that there are ZERO bytes free, not some small number of bytes free.  Besides, my /var/log/mail.log is on the same filesystem, and as far as I can tell, my syslog is working fine.

So, at this point, I'm stumped.  Any ideas?

Dara Poon
IT Services, Department of Electrical and Computer Engineering, University of British Columbia
Reply | Threaded
Open this post in threaded view
|

Re: Occasional transient "Insufficient system storage" errors

John Fawcett
On 15/10/2020 09:44, Dara Poon wrote:

> (Well, that was embarrassing!  I had a Spamassassin milter on outbound mail that tagged my own message as a false positive.  Sending it again for readability.  Sorry!)
>
>
>
> I'm seeing very occasional "Insufficient system storage" errors in my /var/log/mail.log, but it's not the usual lack of disk space.  Here is an excerpt from /var/log/mail.log:
>
> ...
>
>
> The error message is coming from smtpd_check_queue() in src/smtpd/smtpd_check.c.  The disk space check uses fsspace() in src/util/fsspace.c, which on Linux calls statfs(2).  The code confirms that when it complains of insufficient system storage, it means disk blocks, not inodes, memory, shared memory, or anything else.
>
>
> Which filesystem is it testing when it does fsspace(".", &fsbuf)?  If I look at a typical smtpd process, /proc/$pid/cwd is a symlink to /var/spool/postfix, as I would expect, so it should be testing my /var filesystem.  (I don't have separate filesystems under /var such as /var/log, /var/spool, or anything.)
>
> Perhaps smtpd is doing a sneaky momentary chdir(2) to the wrong place before calling fsspace()?  I grepped the entire Postfix source code, and it's always either chdir(var_queue_dir), or chdir(var_daemon_dir), or chdir(var_command_dir), or chdir("/") in a chrooted context — all of which should be in the same /var filesystem.  The exception is chdir(args.cwd) in src/global/pipe_command.c, but surely that's irrelevant? Anyway, as I noted, it affects multiple processes simultaneously.
>
>
> Perhaps there is really something that is momentarily putting huge files in my /var and immediately deleting them?  I monitored that too:
>
> root@smtp2:~# tail -f /var/log/mail.log | while read -r line ; do case "$line" in *Insufficient*) date ; df --block-size=4K /var ; esac ; done
> Wed 14 Oct 2020 08:07:30 PM PDT
> Filesystem         4K-blocks    Used Available Use% Mounted on
> /dev/mapper/vg-var   2020590 1191809    721210  63% /var
> Wed 14 Oct 2020 08:07:30 PM PDT
> Filesystem         4K-blocks    Used Available Use% Mounted on
> /dev/mapper/vg-var   2020590 1191809    721210  63% /var
>
>
> There's nothing interesting in /var/log/messages during the times that this happens.
>
> Whenever it happens, it's always complaining that there are ZERO bytes free, not some small number of bytes free.  Besides, my /var/log/mail.log is on the same filesystem, and as far as I can tell, my syslog is working fine.
>
> So, at this point, I'm stumped.  Any ideas?
>
> Dara Poon
> IT Services, Department of Electrical and Computer Engineering, University of British Columbia

Dara

Sounds like a difficult probem. I don't see where postfix could be
chrooting to some different directory for smptd process other than the
configured queue directory.  Do you really have any other filesystem at
zero space available that could lend weight to the wrong file system
hypothesis?

Also this code has not changed in a long time. The fact that the message
prints out 4096 as the block size indicates that info is being returned
from the statfs call. I don't think there can be any other conclusion
than that the statfs is indeed returning 0 for the bavail field,
probably an issue that postfix could do little about.

One thing you might try instead of using df to debug it, is to use the
postfix test program fsspace, which is not compiled by default (or
equivalent c program) which make the same statfs call as postfix on the
queue directory in normal conditions and when the error is occuring.

John



Reply | Threaded
Open this post in threaded view
|

Re: Occasional transient "Insufficient system storage" errors

Demi M. Obenour
In reply to this post by Dara Poon
On 10/15/20 3:44 AM, Dara Poon wrote:
> (Well, that was embarrassing!  I had a Spamassassin milter on outbound mail that tagged my own message as a false positive.  Sending it again for readability.  Sorry!)

FYI, GMail considered both the original message and the resend to be spam as well.

Sincerely,

Demi

OpenPGP_0xB288B55FFF9C22C1.asc (3K) Download Attachment
OpenPGP_signature (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Occasional transient "Insufficient system storage" errors

Wietse Venema
In reply to this post by Dara Poon
Dara Poon:
> Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: fsspace: .: block size 4096, blocks free 0
> Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: smtpd_check_queue: blocks 4096 avail 0 min_free 0 msg_size_limit 52428800
> Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: NOQUEUE: reject: MAIL from lampe16.ece.ubc.ca[137.82.57.92]: 452 4.3.1 Insufficient system storage; proto=SMTP helo=<lampe16.ece.ubc.ca>

All Postfix daemons chdir() to $queue_directory (and optionally
chroot()). Postfix has worked that way since 1997. If any system
(OS or hardware) randomly returns "zero blocks free space", then
you need a better system. Ther'e an XKCD cartoon about that.

(There was a hack in DDWRT that hard-coded the pathname used in the
fsspace() call, but that does not appear to be the case here).

        Wietse