Find the answer to your Linux question:
Results 1 to 7 of 7
Let's assume this is what I'm working with... $string = 'xxxxxxxxyyyy'; $regex = 'xxxxxxxx'; And my goal is to have the following results... $value1 = 'xxxxxxxx'; $value2 = 'yyyy'; (Remainder ...
  1. #1
    Banned
    Join Date
    Dec 2002
    Location
    Texas
    Posts
    242

    Split String Based on RegEx in Perl

    Let's assume this is what I'm working with...

    $string = 'xxxxxxxxyyyy';
    $regex = 'xxxxxxxx';

    And my goal is to have the following results...

    $value1 = 'xxxxxxxx';
    $value2 = 'yyyy'; (Remainder after $regex)

    What commands am I using to get to that point?
    I have used regex to remove and/or alter text, but
    I have never used them to split apart a string.

    I can hack around to get a result, but I prefer to
    do it as 'properly' as possible and minimizing lines
    and execution cycles.

  2. #2
    Trusted Penguin Cabhan's Avatar
    Join Date
    Jan 2005
    Location
    Seattle, WA, USA
    Posts
    3,230
    Alrighty. So Perl has a built-in split function that is what you are going to use:
    http://perldoc.perl.org/functions/split.html

    Now, the only trick is that split takes a delimiter, whereas you want to actually tell it the first bit, and have the rest split. You can do this by using a special Perl regex feature: the positive lookbehind. For your particular example, for instance, the code would be:
    Code:
    ($value1, $value2) = split /(?<=xxxxxxxx)/, $string;
    What this does is split $string whenever it reaches a section immediately following the string 'xxxxxxxx'. That is, the regex looks backward and says "Do I see the string 'xxxxxxxx' just behind the current position? If so, then I match the CURRENT position."

    This is one of the more advanced features, and is described in the Camel book under the "Fancy Regular Expressions" section.
    DISTRO=Arch
    Registered Linux User #388732

  3. #3
    Banned
    Join Date
    Dec 2002
    Location
    Texas
    Posts
    242
    Very great advice, thanks for the information!
    However, I don't think it likes using a regex...

    -bash-3.00$ ./script.sh
    Variable length lookbehind not implemented in regex

  4. #4
    Trusted Penguin Cabhan's Avatar
    Join Date
    Jan 2005
    Location
    Seattle, WA, USA
    Posts
    3,230
    Aha. Are you using exactly what you posted here, or are you doing something different? Unfortunately, lookbehinds cannot be variable-length: they cannot use quantifiers. For instance, this works properly:
    Code:
    alex@danu ~/test/perl $ cat using_x_in_regex 
    #!/usr/bin/perl -w
    
    use strict;
    
    my $string = 'xxxxyyyy';
    
    my($val1, $val2) = split /(?<=xxxx)/, $string;
    
    print $val1, "\n", $val2, "\n";
    alex@danu ~/test/perl $ ./using_x_in_regex 
    xxxx
    yyyy
    This does not:
    Code:
    alex@danu ~/test/perl $ cat using_x_in_regex_wrong 
    #!/usr/bin/perl -w
    
    use strict;
    
    my $string = 'xxxxyyyy';
    
    my($val1, $val2) = split /(?<=x{2,4})/, $string;
    
    print $val1, "\n", $val2, "\n";
    alex@danu ~/test/perl $ ./using_x_in_regex_wrong 
    Variable length lookbehind not implemented in regex; marked by <-- HERE in m/(?<=x{2,4}) <-- HERE / at ./using_x_in_regex_wrong line 7.
    DISTRO=Arch
    Registered Linux User #388732

  5. #5
    Banned
    Join Date
    Dec 2002
    Location
    Texas
    Posts
    242
    Quote Originally Posted by Cabhan View Post
    Are you using exactly what you posted here, or are you doing something different?
    Sorry if my post wasn't clear. I'm using a regular expression.
    Something along the lines of http://[a-z]+\.domain\.[a-z]{3}.

  6. #6
    Trusted Penguin Cabhan's Avatar
    Join Date
    Jan 2005
    Location
    Seattle, WA, USA
    Posts
    3,230
    Gotcha. This isn't a call for splitting at all. Just use matching.
    Code:
    my($sub, $domain, $end) = $string =~ m|http://([a-z]+)\.(domain)\.([a-z]{3})|;
    If you want to just split the string at some point, then just do two matches:
    Code:
    my($val1, $val2) = $string =~ m|(http://[a-z]+\.)(domain\.[a-z]{3})|;
    See?

    EDIT:

    As a side note, you can use lookbacks with regexes, just it needs to have a specific length. So for instance, if you replaced "[a-z]+" with a specific length. Of course, you can't do that in this particular problem, but lookbacks can still be very useful.
    DISTRO=Arch
    Registered Linux User #388732

  7. #7
    Banned
    Join Date
    Dec 2002
    Location
    Texas
    Posts
    242
    That works PERFECT. Thanks for the help!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...