need help with regexp in header_checks

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

need help with regexp in header_checks

naser sonbaty
Hi,

I need help with postfix regexp in header_checks.
I want discard all emails(any domain) from admin@

I use following:
/^(To|From|Cc|Reply-To): admin@(.*)/        DISCARD

but its not working

thx for help
Reply | Threaded
Open this post in threaded view
|

Re: need help with regexp in header_checks

Stan Hoeppner
On 11/13/2013 2:34 AM, naser sonbaty wrote:
> Hi,
>
> I need help with postfix regexp in header_checks.
> I want discard all emails(any domain) from admin@
>
> I use following:
> /^(To|From|Cc|Reply-To): admin@(.*)/        DISCARD
>
> but its not working

Tests fine here:

$ cat test.regexp
/^(To|From|Cc|Reply-To): admin@(.*)/        DISCARD

$ postmap -q "blah: [hidden email]" regexp:./test.regexp

$ postmap -q "From: [hidden email]" regexp:./test.regexp
DISCARD

$ postmap -q "To: [hidden email]" regexp:./test.regexp
DISCARD

$ postmap -q "CC: [hidden email]" regexp:./test.regexp
DISCARD

$ postmap -q "Reply-To: [hidden email]" regexp:./test.regexp
DISCARD

If these tests work but header_checks isn't working then you need to
execute "postfix reload" to load your new/modified regexp table.

Also, note that the carat (^) anchor isn't necessary.  The header fields
you're testing for are in the left most position.  Thus no reason to
left anchor your expression.

--
Stan
Reply | Threaded
Open this post in threaded view
|

Re: need help with regexp in header_checks

Noel Jones-2
In reply to this post by naser sonbaty
On 11/13/2013 2:34 AM, naser sonbaty wrote:

> Hi,
>
> I need help with postfix regexp in header_checks.
> I want discard all emails(any domain) from admin@
>
> I use following:
> /^(To|From|Cc|Reply-To): admin@(.*)/        DISCARD
>
> but its not working
>
> thx for help


WARNING: This looks like a really bad idea. Use at your own risk.
In particular, discarding mail should be a last resort, especially
for a broad expression like this.

Anyway, this should match better:
/^(To|From|Cc|Reply-To): .*[" <]admin@/        DISCARD



  -- Noel Jones
Reply | Threaded
Open this post in threaded view
|

Re: need help with regexp in header_checks

moparisthebest
Agreed.  Why would you want to discard my emails? :(

Is there something wrong with having an email named admin?

On 11/13/2013 10:01 AM, Noel Jones wrote:

> On 11/13/2013 2:34 AM, naser sonbaty wrote:
>> Hi,
>>
>> I need help with postfix regexp in header_checks.
>> I want discard all emails(any domain) from admin@
>>
>> I use following:
>> /^(To|From|Cc|Reply-To): admin@(.*)/        DISCARD
>>
>> but its not working
>>
>> thx for help
>
>
> WARNING: This looks like a really bad idea. Use at your own risk.
> In particular, discarding mail should be a last resort, especially
> for a broad expression like this.
>
> Anyway, this should match better:
> /^(To|From|Cc|Reply-To): .*[" <]admin@/        DISCARD
>
>
>
>   -- Noel Jones
>

Reply | Threaded
Open this post in threaded view
|

Re: need help with regexp in header_checks

Jan P. Kessler-2
In reply to this post by Stan Hoeppner

> Also, note that the carat (^) anchor isn't necessary.  The header fields
> you're testing for are in the left most position.  Thus no reason to
> left anchor your expression.

Of course there is.

- Anchored expressions are executed faster (the parser has to check the
pattern only against the beginning of the line).

- If I write an e-mail with the following subject, the OP would get a
false-positive:

     Subject: Wrote an e-mail to: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: need help with regexp in header_checks

Bill Cole-3
In reply to this post by Stan Hoeppner
On 13 Nov 2013, at 6:39, Stan Hoeppner wrote:

> On 11/13/2013 2:34 AM, naser sonbaty wrote:
>> Hi,
>>
>> I need help with postfix regexp in header_checks.
>> I want discard all emails(any domain) from admin@
>>
>> I use following:
>> /^(To|From|Cc|Reply-To): admin@(.*)/        DISCARD
>>
>> but its not working
>
> Tests fine here:
>
> $ cat test.regexp
> /^(To|From|Cc|Reply-To): admin@(.*)/        DISCARD
>
> $ postmap -q "blah: [hidden email]" regexp:./test.regexp
>
> $ postmap -q "From: [hidden email]" regexp:./test.regexp
> DISCARD
>
> $ postmap -q "To: [hidden email]" regexp:./test.regexp
> DISCARD
>
> $ postmap -q "CC: [hidden email]" regexp:./test.regexp
> DISCARD
>
> $ postmap -q "Reply-To: [hidden email]" regexp:./test.regexp
> DISCARD
>
> If these tests work but header_checks isn't working then you need to
> execute "postfix reload" to load your new/modified regexp table.
>
> Also, note that the carat (^) anchor isn't necessary.  The header
> fields
> you're testing for are in the left most position.  Thus no reason to
> left anchor your expression.

There absolutely ARE reasons to anchor RE's in header_checks:

1. Performance. In recent years email has developed a sort of header
cancer: new, often proprietary, and often opaque headers that routinely
have logical lengths of hundreds of characters. Not anchoring a header
check to the start of the header when you only want to check a few
specific headers wastes effort scanning for a match anywhere in a
header, potentially taking hundreds of times longer to confirm a
non-match

2. Matching unanticipated headers. Except for the very few headers with
tightly defined structure (e.g. Date), *ANY* header could potentially
include any string that would match "(To|From|Cc|Reply-To): " starting
somewhere other than the start of the line. e.g. "Subject: I'm naive
enough to think I want to discard all mail with To: admin@ in a header"
Reply | Threaded
Open this post in threaded view
|

Re: need help with regexp in header_checks

Bill Cole-3
In reply to this post by naser sonbaty
On 13 Nov 2013, at 3:34, naser sonbaty wrote:

> Hi,
>
> I need help with postfix regexp in header_checks.

Start by reading the documentation, including the man pages for
header_checks and regexp_table and Postfix's BUILTIN_FILTER_README.
Also: following the advice in the last ~50 lines of the DEBUG_README
would help.

> I want discard all emails(any domain) from admin@

I trust that you believe this to be true, but I suspect that you would
eventually regret successfully implementing that goal. As has already
been demonstrated by an earlier respondent, you have no way of knowing
what or who might use the 'admin' address in all domains.

With that noted, I will assume that you actually know what you are doing
and that this is for some special-function mail system that will never
be offered unanticipated but desirable mail...

> I use following:
> /^(To|From|Cc|Reply-To): admin@(.*)/        DISCARD
>
> but its not working
>
> thx for help

To provide real help, we would need you to provide something more
substantial than "its not working." Repeating: following the advice in
the last ~50 lines of the DEBUG_README would help.

In this case, a sample of the mail you want your check to catch that it
is not catching would be a minimum requirement, along with the output of
'postconf -n header_checks' showing that you have the feature enabled.
Also useful: confirmation that the file your configuration specifies
exists, is readable, and has your pattern in it. If the whole file isn't
too long and holds no secrets it could even be useful to share it all,
since there are potential rule-ordering pitfalls.

Reply | Threaded
Open this post in threaded view
|

Re: need help with regexp in header_checks

Stan Hoeppner
In reply to this post by Bill Cole-3
On 11/13/2013 9:50 AM, Bill Cole wrote:
> On 13 Nov 2013, at 6:39, Stan Hoeppner wrote:

>> Also, note that the carat (^) anchor isn't necessary.  The header fields
>> you're testing for are in the left most position.  Thus no reason to
>> left anchor your expression.
>
> There absolutely ARE reasons to anchor RE's in header_checks:
>
> 1. Performance. In recent years email has developed a sort of header
> cancer: new, often proprietary, and often opaque headers that routinely
> have logical lengths of hundreds of characters. Not anchoring a header
> check to the start of the header when you only want to check a few
> specific headers wastes effort scanning for a match anywhere in a
> header, potentially taking hundreds of times longer to confirm a non-match

In recent years CPUs have become so blindingly fast it makes no
difference.  Any excess cycles burned by a non anchored regex were idle
cycles anyway.  There are good arguments for anchoring expressions, but
saving CPU cycles is simply no longer one of them, not for years now.

I used to make your argument here, but again, it no longer applies.

> 2. Matching unanticipated headers. Except for the very few headers with
> tightly defined structure (e.g. Date), *ANY* header could potentially
> include any string that would match "(To|From|Cc|Reply-To): " starting
> somewhere other than the start of the line. e.g. "Subject: I'm naive
> enough to think I want to discard all mail with To: admin@ in a header"

This is a stronger argument, though I'm not sure how realistic a
scenario this is, with the email address in the subject.  A better
argument would be that without anchoring the expression would also match
headers such as

X-Original-To: [hidden email]
Delivered-To: [hidden email]

in which case I'd agree he should anchor.  I didn't take these into
account previously.

--
Stan
Reply | Threaded
Open this post in threaded view
|

Re: need help with regexp in header_checks

Viktor Dukhovni
On Thu, Nov 14, 2013 at 12:32:45AM -0600, Stan Hoeppner wrote:

> In recent years CPUs have become so blindingly fast it makes no
> difference.  Any excess cycles burned by a non anchored regex were idle
> cycles anyway.  There are good arguments for anchoring expressions, but
> saving CPU cycles is simply no longer one of them, not for years now.

Mere excuse for sloppiness.  Always anchor, then when possible
discard leading "^.*" and trailing ".*$".

--
        Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: need help with regexp in header_checks

tejas sarade
In reply to this post by Noel Jones-2

I think .* will match everythig.

On Nov 13, 2013 8:32 PM, "Noel Jones" <[hidden email]> wrote:
Reply | Threaded
Open this post in threaded view
|

Re: need help with regexp in header_checks

Stan Hoeppner
In reply to this post by Viktor Dukhovni
On 11/14/2013 12:41 AM, Viktor Dukhovni wrote:
> On Thu, Nov 14, 2013 at 12:32:45AM -0600, Stan Hoeppner wrote:
>
>> In recent years CPUs have become so blindingly fast it makes no
>> difference.  Any excess cycles burned by a non anchored regex were idle
>> cycles anyway.  There are good arguments for anchoring expressions, but
>> saving CPU cycles is simply no longer one of them, not for years now.
>
> Mere excuse for sloppiness.  

I find that offensive Viktor.  There is a huge difference between
arguing a point of fact and arguing a position.  Above is an example of
the former, and is a correct statement.

> Always anchor, then when possible
> discard leading "^.*" and trailing ".*$".

Yes, for people who have the time and dedication to "do it right", such
as ourselves.  Others can take shortcuts and get the job done, just as
PHP/Perl/Java/etc heretics don't use C.  It seemed to me in this case to
offer the OP a shortcut.  That may have been incorrect.  Tar and feather
me for that if you like, but do not accuse me of practicing or promoting
sloppiness, as that is simply not true.  My work speaks for itself.  But
apparently you've never even looked at it, despite it being mentioned
here dozens or hundreds of times over the past few years.  You've formed
an opinion and are making untrue statements based solely on my few words
in this thread.  Look at it:

http://www.hardwarefreak.com/fqrdns.pcre.txt

Do you consider these regexes sloppy?

I could remove the anchoring and they would still work in the targeted
use case.  And the additional CPU burn wouldn't be noticeable, if even
measurable.  But I started with fully qualified expressions years ago,
hence the name of the table, and I've stuck with them, even though I
don't really need to.  Tell me that's what a sloppy person would do.

--
Stan
Reply | Threaded
Open this post in threaded view
|

Re: need help with regexp in header_checks

Viktor Dukhovni
On Thu, Nov 14, 2013 at 01:35:39AM -0600, Stan Hoeppner wrote:

> > Mere excuse for sloppiness.  
>
> I find that offensive Viktor.  There is a huge difference between
> arguing a point of fact and arguing a position.  Above is an example of
> the former, and is a correct statement.

Sorry to hear that.  Just because you're posting excuses for
sloppiness does not mean that your work is not valuable.  Both are
true at the same time.  The impact of the sloppiness may be minor
to insignificant, and that's what makes it mere sloppiness rather
than say negligence or incompetence which are not in question here.

Making fewer mistakes is not mere luck, it is the result of meticulous
habits.  CPU efficiency has nothing to do with my comment.  Anchored
expressions yield fewer surprises, and not using them habitually
is sloppy.  Make anchored regular expressions a habit.

This is analogous to always putting shell "${variable}" expansions
in double quotes (except on rare occasions when you want word-splitting)
and various other ways of generally staying out of trouble.

I could mention using "set -e" in shell scripts to avoid undetected
command failures, or using:

        sendmail -f "${sender}" ...

instead of:

        sendmail -f"${sender}" ...

because the latter misbehaves when "${sender}" is empty.

> > Always anchor, then when possible
> > discard leading "^.*" and trailing ".*$".
>
> Yes, for people who have the time and dedication to "do it right", such
> as ourselves.  Others can take shortcuts and get the job done, just as
> PHP/Perl/Java/etc heretics don't use C.

Don't under-estimate the rest of humanity, teach them.

--
        Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: need help with regexp in header_checks

Noel Jones-2
In reply to this post by tejas sarade
On 11/14/2013 1:07 AM, tejas sarade wrote:
> I think .* will match everythig.
>
> On Nov 13, 2013 8:32 PM, "Noel Jones" <[hidden email]

The expression I posted is correct.
/^(To|From|Cc|Reply-To): .*[" <]admin@/        DISCARD

This should match headers such as
From: System admin <[hidden email]>
or other variations.


  -- Noel Jones
Reply | Threaded
Open this post in threaded view
|

Re: need help with regexp in header_checks

Bill Cole-3
In reply to this post by Stan Hoeppner
On 14 Nov 2013, at 1:32, Stan Hoeppner wrote:

> On 11/13/2013 9:50 AM, Bill Cole wrote:
>> On 13 Nov 2013, at 6:39, Stan Hoeppner wrote:
>
>>> Also, note that the carat (^) anchor isn't necessary.  The header
>>> fields
>>> you're testing for are in the left most position.  Thus no reason to
>>> left anchor your expression.
>>
>> There absolutely ARE reasons to anchor RE's in header_checks:
>>
>> 1. Performance. In recent years email has developed a sort of header
>> cancer: new, often proprietary, and often opaque headers that
>> routinely
>> have logical lengths of hundreds of characters. Not anchoring a
>> header
>> check to the start of the header when you only want to check a few
>> specific headers wastes effort scanning for a match anywhere in a
>> header, potentially taking hundreds of times longer to confirm a
>> non-match
>
> In recent years CPUs have become so blindingly fast it makes no
> difference.  Any excess cycles burned by a non anchored regex were
> idle
> cycles anyway.  There are good arguments for anchoring expressions,
> but
> saving CPU cycles is simply no longer one of them, not for years now.
>
> I used to make your argument here, but again, it no longer applies.

I think it might surprise you to learn how many mail servers run on
systems constrained by CPU and RAM. This used to be a consequence of old
hardware being repurposed to utility service and ambushed by the need to
filter mail (a relative novelty) but today it is often the result of
virtualization being used to maximize utilization of all those cheap and
abundant resources. If your mail server is running on dedicated recent
but not bleeding-edge hardware you may not care about CPU, but if it is
running on a VPS capped at 300MHz or billed by real CPU usage, you do.
Reply | Threaded
Open this post in threaded view
|

Re: need help with regexp in header_checks

Michael P. Demelbauer
In reply to this post by Noel Jones-2
On Thu, Nov 14, 2013 at 08:19:52AM -0600, Noel Jones wrote:

> On 11/14/2013 1:07 AM, tejas sarade wrote:
> > I think .* will match everythig.
> >
> > On Nov 13, 2013 8:32 PM, "Noel Jones" <[hidden email]
>
> The expression I posted is correct.
> /^(To|From|Cc|Reply-To): .*[" <]admin@/        DISCARD
>
> This should match headers such as
> From: System admin <[hidden email]>
> or other variations.
>
>
>   -- Noel Jones

Hallo Noel,

this might be off topic here, but I'm wondering about the regexp since
yesterday.

How will this match "<admin@....>" a variant I've already seen in some
clients. If I understand the alternation correctly it searches for "
Blank or < directly followed by admin@. What's my mistake?

Many thx and sorry for OT,
--
Michael P. Demelbauer
Systemadministration
WSR
Arsenal, Objekt 20
1030 Wien
-------------------------------------------------------------------------------
Memory is like an orgasm, it's a lot better,
if you don't have to fake it.
-- Linux fortunes
Reply | Threaded
Open this post in threaded view
|

Re: need help with regexp in header_checks

Wietse Venema
In reply to this post by Bill Cole-3
Stan, your contributions are appreciated but please do not criticize
those who suggest improvements.

Anchoring regular expressions (that don't start with wild-card) is
a must to avoid false matches. This is a correctness issue.  Matching
"To:" just because it appears in a Subject: is wrong.

Savings in CPU cycles come second. The best way to save cycles is
to group patterns with the same prefix under IF/ELSE/ENDIF.

        Wietse
Reply | Threaded
Open this post in threaded view
|

Re: need help with regexp in header_checks

Noel Jones-2
In reply to this post by Michael P. Demelbauer
On 11/14/2013 9:27 AM, Michael P. Demelbauer wrote:

> On Thu, Nov 14, 2013 at 08:19:52AM -0600, Noel Jones wrote:
>> On 11/14/2013 1:07 AM, tejas sarade wrote:
>>> I think .* will match everythig.
>>>
>>> On Nov 13, 2013 8:32 PM, "Noel Jones" <[hidden email]
>>
>> The expression I posted is correct.
>> /^(To|From|Cc|Reply-To): .*[" <]admin@/        DISCARD
>>
>> This should match headers such as
>> From: System admin <[hidden email]>
>> or other variations.
>>
>>
>>   -- Noel Jones
>
> Hallo Noel,
>
> this might be off topic here, but I'm wondering about the regexp since
> yesterday.
>
> How will this match "<admin@....>" a variant I've already seen in some
> clients. If I understand the alternation correctly it searches for "
> Blank or < directly followed by admin@. What's my mistake?
>
> Many thx and sorry for OT,
>


Given the case of "<admin@....>", the .* will match the " and the
grouping will match the < followed by admin@.



  -- Noel Jones
Reply | Threaded
Open this post in threaded view
|

Re: need help with regexp in header_checks

@lbutlr
In reply to this post by Noel Jones-2
On Nov 13, 2013, at 8:01, Noel Jones <[hidden email]> wrote:
> Anyway, this should match better:
> /^(To|From|Cc|Reply-To): .*[" <]admin@/        DISCARD

Besides the discussion on the need to anchor the regex (you do), I'm trying to wrap my head around why one would want to discard mail from admin@?

I mean, I reject mails that claim to come from LOCAL admin accounts, but in general?