Is there any documentation on the binary format of the mail files under /var/spool/postfix/ ?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Is there any documentation on the binary format of the mail files under /var/spool/postfix/ ?

Hubro
Posfix keeps mails in a binary format in folders under /var/spool/postfix, at least by default.

I want to write some tools for searching and filtering by the meta data of a large number (hundreds of thousands) of emails under /var/spool/postfix/deferred. Among other things, I want to find all queue IDs of mails sent from specific IP adresses so that they can be deleted.

I'm having some problems understanding the binary format of the files though. It seems that the envelope records starts with the bytes "\x41\x16" and ends at the bytes "\x4d\x00". The records are separated by two bytes, the first of which is "\x41" and the second of which varies, and I don't understand the logic behind it.

Is the binary format of these files documented anywhere? I have searched for quite a while with no luck. I get the sense that the format is so simple that it could be explained in a few paragraphs, but alas I haven't quite been able to make sense of it yet.

Also, is the binary format of these files *stable*? As in, does the format change depending on which Postfix version created them?

Any information related to the binary format of these files would be greatly appreciated.

---

NB: I want to write my own tools for this partly for learning and fun, but also because tools like "postqueue" and "postcat" are just WAY too slow when we're talking about hundreds of thousands of mails, which some times happens when users of my mail servers get infected by spam scripts.
Reply | Threaded
Open this post in threaded view
|

Re: Is there any documentation on the binary format of the mail files under /var/spool/postfix/ ?

Christian Recktenwald
About three years ago I had a pretty similar question:
<[hidden email]>  .

The answer I got:

On Tue, Sep 24, 2013 at 09:37:11AM -0400, Wietse Venema wrote:
> > If by chance someone could provide a pointer to the documentation
> > of the queue file format this would be appreciated as well.
>
> The file format is private. Modification by non-Postfix programs
> breaks the warranty, meaning no support.
>
>   Wietse

I believe this should do the trick:

#!/bin/bash

IP_TO_DELETE="10.2.3.4"
mailq |
        awk '/^[A-F0-9]/ { mid=$1; qfile=$1  ; sub(/^(.)/,"/var/spool/postfix/deferred/&/&",qfile); print mid, qfile };' |
        while read mid qfile ; do
                # echo "#### $mid : $qfile ####"
                CLIENT_IP=$(
                        postcat -e "$qfile" | awk -F= '$1 == "named_attribute: client_address" { print $2 }'
                )
                if [ "$CLIENT_IP" = "$IP_TO_DELETE" ]; then
                        postsuper -d "$mid"
                fi
        done


Reply | Threaded
Open this post in threaded view
|

Re: Is there any documentation on the binary format of the mail files under /var/spool/postfix/ ?

Wietse Venema
In reply to this post by Hubro
Queue files are part of the Postfix internal API. Non-Postfix
programs that depend on internal details are NOT SUPPORTED and WILL
CAUSE LOST MAIL.

That saud, there are ways to do this that don't require unsupported
usage.

Hubro:
> I want to write some tools for searching and filtering by the meta data of a
> large number (hundreds of thousands) of emails under
> /var/spool/postfix/deferred. Among other things, I want to find all queue
> IDs of mails sent from specific IP adresses so that they can be deleted.

That information is already available from the Postfix maillog file.

    $ grep client= /var/log/maillog
    May 28 01:22:27 spike postfix/smtpd[74407]: 3wb7Xz3wSczJrP0: client=molamola.ripe.net[2001:67c:2e8:11::c100:1371]

This shows the queue ID and client name[ip address]. Once you extract
the queue IDs of interest you can pipe the result into the postsuper
command:

    grep client= /var/log/maillog | extract queue IDs | postsuper -d -

If you don't have a mailllog, you could use the postqueue command
to list queue file metadata as JSON objects:

    postqueue -j | extract queue IDs | postsuper -d -

In this case you are welcome to provide a patch to that adds the
remote SMTP client attributes to the JSON output.

Another possibility: submit a patch for the postcat command to read
queue IDs from stdin, and produce output that is easier to filter.

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: Is there any documentation on the binary format of the mail files under /var/spool/postfix/ ?

Wietse Venema
Wietse Venema:
>     $ grep client= /var/log/maillog
>     May 28 01:22:27 spike postfix/smtpd[74407]: 3wb7Xz3wSczJrP0: client=molamola.ripe.net[2001:67c:2e8:11::c100:1371]

This example assumes that you have "enable_long_queue_ids = yes",
so that a queue ID is used only once. Short queue IDs are reused
and you could delete the wrong file.

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: Is there any documentation on the binary format of the mail files under /var/spool/postfix/ ?

Hubro
In reply to this post by Wietse Venema
Wietse Venema wrote
That information is already available from the Postfix maillog file.
My first attempt at solving this actually relied on going through the log file to find the client IP, but I found out that the line containing "client=..." was frequently missing. If I grep the log file for a mail ID, I often just find "to=" and "from=" lines, and no lines containing "client=".
Reply | Threaded
Open this post in threaded view
|

Re: Is there any documentation on the binary format of the mail files under /var/spool/postfix/ ?

Hubro
In reply to this post by Christian Recktenwald
I have already made similar scripts, but the issue is that it runs "postcat" and "postsuper" once for every queue ID, so it becomes absolutely unusable when needing to delete tens- or hundreds of thousands of emails.

So far I have been lucky in that most of spam scripts send mail with only a few different sender email addresses, so I've been able to grep the output of "postqueue -p" 4-5 times, used Vim to create a long list of queue IDs and fed it to "postsuper -d -" through the standard input.

However, I'm never sure I've caught all the spam mails sent from the specific IP, and some day I could have to clean up the spam of a script that generates random sender email addresses. That day I'm going to need a fast script that can filter queue IDs by sender IP.

I really, really wish "postcat -e" had a "-" option, like postsuper, that allowed me to stream queue IDs in through stdin...
Reply | Threaded
Open this post in threaded view
|

Re: Is there any documentation on the binary format of the mail files under /var/spool/postfix/ ?

Hubro
In reply to this post by Wietse Venema
A postqueue option that listed all queued mails in JSON with some envelope information like sender IP would be amazing... If I had more free time I would consider trying to patch it in as you suggested.

But - I think it will be much less work to write a script that feeds large batches of queue IDs to "postcat -e" and greps the output. In theory it should be quite fast too.

Thanks a lot for your input!

I'll probably upload the finished script here when it's done.
Reply | Threaded
Open this post in threaded view
|

Re: Is there any documentation on the binary format of the mail files under /var/spool/postfix/ ?

Bill Cole-3
In reply to this post by Hubro
On 28 May 2017, at 19:07, Hubro wrote:

> I really, really wish "postcat -e" had a "-" option, like postsuper, that
> allowed me to stream queue IDs in through stdin...

xargs postcat -e < listofqueuefiles.txt

OR

{ some procedure that spits out target queue filenames } | xargs postcat -e

OR

{ some procedure that spits out target queue IDs } | xargs postcat -qe

Reply | Threaded
Open this post in threaded view
|

Re: Is there any documentation on the binary format of the mail files under /var/spool/postfix/ ?

Peter Ajamian
In reply to this post by Hubro
On 29/05/17 11:07, Hubro wrote:
> I have already made similar scripts, but the issue is that it runs "postcat"
> and "postsuper" once for every queue ID, so it becomes absolutely unusable
> when needing to delete tens- or hundreds of thousands of emails.

postcat -e "$(postconf -h queue_directory)/deferred/"?/* |
your_program_that_parses_data_and_outputs_queue_ids | postsuper -d -

Runs postconf, postcat and postsuper once each.

You can modify for other queues accordingly.


Peter
Reply | Threaded
Open this post in threaded view
|

Re: Is there any documentation on the binary format of the mail files under /var/spool/postfix/ ?

Hubro
The problem with that is that you're passing all the mail file paths right in the command line. Say one path is 41 bytes (which they are on my system), filtering 100 000 mails results in 4,1 MB of paths passed to postcat as command line arguments, which is double the limit of my home desktop.

That's why I'm currently working on a script for sending paths to postcat in chunks. This will also let me display a progress bar.
Reply | Threaded
Open this post in threaded view
|

Re: Is there any documentation on the binary format of the mail files under /var/spool/postfix/ ?

Hubro
In reply to this post by Bill Cole-3
The problem with that is that you're passing all the mail file paths right in the command line. Say one path is 41 bytes (which they are on my system), filtering 100 000 mails results in 4,1 MB of paths passed to postcat as command line arguments, which is double the limit of my home desktop.

That's why I'm currently working on a script for sending paths to postcat in chunks. This will also let me display a progress bar.
Reply | Threaded
Open this post in threaded view
|

Re: Is there any documentation on the binary format of the mail files under /var/spool/postfix/ ?

Peter Ajamian
In reply to this post by Hubro
On 29/05/17 15:59, Hubro wrote:
> The problem with that is that you're passing all the mail file paths right in
> the command line. Say one path is 41 bytes (which they are on my system),
> filtering 100 000 mails results in 4,1 MB of paths passed to postcat as
> command line arguments, which is double the limit of my home desktop.

find "$(postconf -h queue_directory)/deferred/)" -type f -exec postcat
-e {} + | your_program | postsuper -d -

In the above find will put as many paths on the command line as will fit
and run multiple instances of postcat to make up the difference, so in
your case it would run postcat 2 or 3 times to get all the file paths
passed, then the output of the whole thing would go to your program and
the output of that to postsuper.

So running find once, postcat 2 or 3 times, your program once, postconf
once and postsuper once ... not too bad.


Peter
Reply | Threaded
Open this post in threaded view
|

Re: Is there any documentation on the binary format of the mail files under /var/spool/postfix/ ?

Peter Ajamian
On 29/05/17 16:57, Peter wrote:
> find "$(postconf -h queue_directory)/deferred/)" -type f -exec postcat
> -e {} + | your_program | postsuper -d -

Oops, typo there, should be:

find "$(postconf -h queue_directory)/deferred/" -type f -exec postcat -e
{} + | your_program | postsuper -d -


Peter

Reply | Threaded
Open this post in threaded view
|

Re: Is there any documentation on the binary format of the mail files under /var/spool/postfix/ ?

Peter Ajamian
In reply to this post by Hubro
On 29/05/17 16:00, Hubro wrote:
> The problem with that is that you're passing all the mail file paths right in
> the command line.

No, he's not, go look up the xargs man page and see what it does.

It's basically a variation on the find solution I just gave you.


Peter
Reply | Threaded
Open this post in threaded view
|

Re: Is there any documentation on the binary format of the mail files under /var/spool/postfix/ ?

Bastian Blank-3
In reply to this post by Hubro
On Sun, May 28, 2017 at 04:07:08PM -0700, Hubro wrote:
> I have already made similar scripts, but the issue is that it runs "postcat"
> and "postsuper" once for every queue ID, so it becomes absolutely unusable
> when needing to delete tens- or hundreds of thousands of emails.

Why do you have tens- or hundreds of thousands of emails waiting to be
deleted?

Bastian

--
No one wants war.
                -- Kirk, "Errand of Mercy", stardate 3201.7
Reply | Threaded
Open this post in threaded view
|

Re: Is there any documentation on the binary format of the mail files under /var/spool/postfix/ ?

Hubro
In reply to this post by Peter Ajamian
My bad, I had no idea xargs and find could do that. That is extremely cool!
Reply | Threaded
Open this post in threaded view
|

Re: Is there any documentation on the binary format of the mail files under /var/spool/postfix/ ?

Wietse Venema
Hubro:
> My bad, I had no idea xargs and find could do that. That is extremely cool!

I agree, ``find | xargs postcat...'' is practically as good as
having postcat read queue file names from stdin.

Combine with a filter that pipes queue IDs into ``postsuper -d -'',
and the whole thing will be as fast as a hand-crafted queue file
parser.

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: Is there any documentation on the binary format of the mail files under /var/spool/postfix/ ?

Scott Lambert
In reply to this post by Bastian Blank-3
On Mon, May 29, 2017 at 10:34:11AM +0200, Bastian Blank wrote:
> On Sun, May 28, 2017 at 04:07:08PM -0700, Hubro wrote:
> > I have already made similar scripts, but the issue is that it runs "postcat"
> > and "postsuper" once for every queue ID, so it becomes absolutely unusable
> > when needing to delete tens- or hundreds of thousands of emails.
>
> Why do you have tens- or hundreds of thousands of emails waiting to be
> deleted?

Right.  Use a policy daemon to limit the number of messages a client,
sasl_username or IP address, can send in an hour/day.  Then run scripts
every interval to warn you of IPs and sasl_usernames as they approach
the limits.  Then you can take corrective action before millions of
messages have been sent and the queue has 100s of thousands of defered
messages.

I remember the days of trying to improve the cleanup proceedure.
They were horrible days.  

With policyd, 200 messages/recipients per hour, 500 messages/recipients
per day works for most of my customer base.  I have a few, thirty-ish,
who legitimately need to send more than that.  A separate policy group
with limits of 10,000 per hour and 50,000 per day handle them.  They get
the talk about keeping their passwords secure.  You would have to pick
values which make sense for your userbase.

Scripts process the maillog hourly and send me e-mail telling me the
top sasl_usernames for the day and what IPs they've logged in from
with counts, and the logins from "unusual" IP addresses since the log
rotated.  If someone logs in from overseas who usually doesn't, I look
at the maillog to see if the recipeint addresses look okay.  If I can't
tell, I just watch it over the next day or week.

We still have between 3 and 20 accounts phished each month, but I don't
lose sleep over them.  I don't even stress over doing the cleanup.
There are usually very few messages still in the queue.  And having
allowed even 1,500 messages via three phished accounts while I was
off-line for sleep does so much less damage to my server's reputation.

Having three of the high-volume accounts compromised would be bad, but
not as bad as one account without limits for one hour.

When we have a compromise, I run a script with disables the account,
changes their password to a new random value, and e-mails the customer
service reps.  I sleep really well and spend a lot less time on
compromised accounts now.

--
Scott Lambert                    KC5MLE                       Unix SysAdmin
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Is there any documentation on the binary format of the mail files under /var/spool/postfix/ ?

Geert Stappers
In reply to this post by Peter Ajamian
On Mon, May 29, 2017 at 05:02:36PM +1200, Peter wrote:
> On 29/05/17 16:57, Peter wrote:
> > find "$(postconf -h queue_directory)/deferred/)" -type f -exec postcat
> > -e {} + | your_program | postsuper -d -
>
> Oops, typo there, should be:
>
> find "$(postconf -h queue_directory)/deferred/" -type f -exec postcat -e
> {} + | your_program | postsuper -d -
>

I don't the difference between them. What is it?

Is it the +  that should be removed??


Groeten
Geert Stappers
--
Leven en laten leven
Reply | Threaded
Open this post in threaded view
|

Re: Is there any documentation on the binary format of the mail files under /var/spool/postfix/ ?

Peter Ajamian
On 30/05/17 19:40, Geert Stappers wrote:

> On Mon, May 29, 2017 at 05:02:36PM +1200, Peter wrote:
>> On 29/05/17 16:57, Peter wrote:
>>> find "$(postconf -h queue_directory)/deferred/)" -type f -exec postcat
>>> -e {} + | your_program | postsuper -d -
>>
>> Oops, typo there, should be:
>>
>> find "$(postconf -h queue_directory)/deferred/" -type f -exec postcat -e
>> {} + | your_program | postsuper -d -
>>
>
> I don't the difference between them. What is it?

The first one has a stray ")".

> Is it the +  that should be removed??

The "+" is in both.


Peter
12