Results 1 to 7 of 7
Let's assume this is what I'm working with...
$string = 'xxxxxxxxyyyy';
$regex = 'xxxxxxxx';
And my goal is to have the following results...
$value1 = 'xxxxxxxx';
$value2 = 'yyyy'; (Remainder ...
- 05-10-2007 #1Banned
- Join Date
- Dec 2002
- Location
- Texas
- Posts
- 242
Split String Based on RegEx in Perl
Let's assume this is what I'm working with...
$string = 'xxxxxxxxyyyy';
$regex = 'xxxxxxxx';
And my goal is to have the following results...
$value1 = 'xxxxxxxx';
$value2 = 'yyyy'; (Remainder after $regex)
What commands am I using to get to that point?
I have used regex to remove and/or alter text, but
I have never used them to split apart a string.
I can hack around to get a result, but I prefer to
do it as 'properly' as possible and minimizing lines
and execution cycles.
- 05-10-2007 #2
Alrighty. So Perl has a built-in split function that is what you are going to use:
http://perldoc.perl.org/functions/split.html
Now, the only trick is that split takes a delimiter, whereas you want to actually tell it the first bit, and have the rest split. You can do this by using a special Perl regex feature: the positive lookbehind. For your particular example, for instance, the code would be:
What this does is split $string whenever it reaches a section immediately following the string 'xxxxxxxx'. That is, the regex looks backward and says "Do I see the string 'xxxxxxxx' just behind the current position? If so, then I match the CURRENT position."Code:($value1, $value2) = split /(?<=xxxxxxxx)/, $string;
This is one of the more advanced features, and is described in the Camel book under the "Fancy Regular Expressions" section.DISTRO=Arch
Registered Linux User #388732
- 05-10-2007 #3Banned
- Join Date
- Dec 2002
- Location
- Texas
- Posts
- 242
Very great advice, thanks for the information!
However, I don't think it likes using a regex...
-bash-3.00$ ./script.sh
Variable length lookbehind not implemented in regex
- 05-10-2007 #4
Aha. Are you using exactly what you posted here, or are you doing something different? Unfortunately, lookbehinds cannot be variable-length: they cannot use quantifiers. For instance, this works properly:
This does not:Code:alex@danu ~/test/perl $ cat using_x_in_regex #!/usr/bin/perl -w use strict; my $string = 'xxxxyyyy'; my($val1, $val2) = split /(?<=xxxx)/, $string; print $val1, "\n", $val2, "\n"; alex@danu ~/test/perl $ ./using_x_in_regex xxxx yyyy
Code:alex@danu ~/test/perl $ cat using_x_in_regex_wrong #!/usr/bin/perl -w use strict; my $string = 'xxxxyyyy'; my($val1, $val2) = split /(?<=x{2,4})/, $string; print $val1, "\n", $val2, "\n"; alex@danu ~/test/perl $ ./using_x_in_regex_wrong Variable length lookbehind not implemented in regex; marked by <-- HERE in m/(?<=x{2,4}) <-- HERE / at ./using_x_in_regex_wrong line 7.DISTRO=Arch
Registered Linux User #388732
- 05-10-2007 #5Banned
- Join Date
- Dec 2002
- Location
- Texas
- Posts
- 242
Sorry if my post wasn't clear. I'm using a regular expression.
Something along the lines of http://[a-z]+\.domain\.[a-z]{3}.
- 05-11-2007 #6
Gotcha. This isn't a call for splitting at all. Just use matching.
If you want to just split the string at some point, then just do two matches:Code:my($sub, $domain, $end) = $string =~ m|http://([a-z]+)\.(domain)\.([a-z]{3})|;
See?Code:my($val1, $val2) = $string =~ m|(http://[a-z]+\.)(domain\.[a-z]{3})|;
EDIT:
As a side note, you can use lookbacks with regexes, just it needs to have a specific length. So for instance, if you replaced "[a-z]+" with a specific length. Of course, you can't do that in this particular problem, but lookbacks can still be very useful.DISTRO=Arch
Registered Linux User #388732
- 05-11-2007 #7Banned
- Join Date
- Dec 2002
- Location
- Texas
- Posts
- 242
That works PERFECT. Thanks for the help!


Reply With Quote
