Find the answer to your Linux question:
Results 1 to 9 of 9
Let's say I have a phrase, and I want to be able to catch the phrase via a regular expression whenever it might have a non-normal character inserted. Let's say ...
  1. #1
    Banned
    Join Date
    Dec 2002
    Location
    Texas
    Posts
    242

    Question about a RegEx

    Let's say I have a phrase, and I want to be able to catch
    the phrase via a regular expression whenever it might have
    a non-normal character inserted.

    Let's say I want to catch anything from...

    "M_ake Money" to "Make Mone_y"

    Obviously w/ the _ (or similar) inserted anywhere in between.
    Is there a way to write a regex that may catch the situation?

    What I'm trying to accomplish is probably real obvious...

  2. #2
    Linux User
    Join Date
    Jun 2007
    Posts
    318
    One way I can think is to use sed to remove those characters in your example:

    Code:
    SUBJ="M_ake Money"
    tmp="`echo "$SUBJ" | sed "s/[_-]//g"`"
    if [ "$tmp" = "Make Money" ]; then echo "Spam"; fi
    This example removes underscores and dashes.

  3. #3
    Banned
    Join Date
    Dec 2002
    Location
    Texas
    Posts
    242
    Thanks for the suggestion.

    But ideally I do not want "Make Money" to be a match.
    I only want matches when someone is trying to be sly.

  4. #4
    Trusted Penguin Cabhan's Avatar
    Join Date
    Jan 2005
    Location
    Seattle, WA, USA
    Posts
    3,230
    So let me get this straight:

    You want to match any block of text that has an (for this instance) underscore in it?

    /_/

    Code:
    alex@danu ~ $ echo "Make Money" | grep '_'
    alex@danu ~ $ echo "M_ake Money" | grep '_'
    M_ake Money
    So to make this simpler, you can define the acceptable characters, and take the complement:
    Code:
    alex@danu ~ $ echo "Make Money" | grep '[^a-zA-Z ]'
    alex@danu ~ $ echo "M_ake Money" | grep '[^a-zA-Z ]'
    M_ake Money
    Does this help?
    DISTRO=Arch
    Registered Linux User #388732

  5. #5
    Linux User
    Join Date
    Jun 2007
    Posts
    318
    Quote Originally Posted by thehemi View Post
    Thanks for the suggestion.

    But ideally I do not want "Make Money" to be a match.
    I only want matches when someone is trying to be sly.
    So you want something more generic. Using Cabhan's suggestion how about this:

    Code:
    #!/bin/bash -vx
    
    SUBJ="M_ake Money"
    tmp="`echo "$SUBJ" | sed "s/[^a-zA-Z ]//g"`"
    if [ "$tmp" != "$SUBJ" ]; then echo "Spam"; fi

  6. #6
    Banned
    Join Date
    Dec 2002
    Location
    Texas
    Posts
    242
    I can solve this problem all day w/ perl or bash.
    I'm looking for a quick & easy regex answer.

    I'm starting to think I might have to abandon the
    regex plans and create a custom script instead...

  7. #7
    Trusted Penguin Cabhan's Avatar
    Join Date
    Jan 2005
    Location
    Seattle, WA, USA
    Posts
    3,230
    You want a regular expression that accepts a normal character followed by any other number of characters, with a non-normal character somewhere in there?

    /[a-zA-Z0-9 ]+[^a-zA-Z0-9 ][a-zA-Z0-9 ]*/

    This is almost exactly what was in the grep before. grep deals with regular expressions.
    DISTRO=Arch
    Registered Linux User #388732

  8. #8
    Banned
    Join Date
    Dec 2002
    Location
    Texas
    Posts
    242
    Quote Originally Posted by Cabhan View Post
    You want a regular expression that accepts a normal character followed by any other number of characters, with a non-normal character somewhere in there?
    Not any set of normal characters. Specific phrases that I specify. Make Money, Top Quality, University Degree, whatever the case may be that I'm trying to hunt for the non-normal characters embedded within. I was just curious if there was an easy trick using regex that I was not aware of. I'm starting to think that it's not the case, unfortunately.

  9. #9
    Banned
    Join Date
    Dec 2002
    Location
    Texas
    Posts
    242
    Think of something like this...

    Code:
    M[_\-*]{0,1}a[_\-*]{0,1}k[_\-*]{0,1}e[_\-*]{0,1} [_\-*]{0,1}M[_\-*]{0,1}o[_\-*]{0,1}n[_\-*]{0,1}e[_\-*]{0,1}y
    I was just curious if there was, say, a shorthand trick to it.
    It's quite obvious that creating and managing many regexes
    in this syntax is practically impossible... LOL

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...