I am having some trouble getting this sed oneliner working right:
Code:
s/\( \|"\)\([A-Z]\{0,1\}[a-z]\{1,\}ance\|cable\|cy\|ence\|ery\|ious\|mental\|ment\|ness\|sion\|tional\|tion\|tive\)\( \|,\|'\|"\|;\|:\|.\)/\1<b><font color="ff0000">\2<\/font><\/b>\3/g
I want all words ending in any of the first set of 13 OR conditions to match the expr and then have the entire word flanked by HTML bold and font tags as specified, but of course this is not quite occurring yet.

my logic follows that sed should:
first) look for either a space or " char
second) then for the next char that follows check if it is a single uppercase letter or not, which is optional and ok
third) then at least 1 lowercase char must follow in the expr up to an indefinite (undefine) series of lowercase letters
fourth) the next char that follows, breaking the series of lowercase letters must match 1 of the 13 possibilities separated by the first series of \| OR conditionals
fifth) finally the expr must end with 1 of 7 chars that are again separated by the \| OR conditional; specifically the expr should not be an alphanumeric char (basically either a space or punctuation to mark the end of a word)

Unfortunately my logic is failing here somewhere, because not only are words like "environmental" not matching the expr (but should b/c it begins with a " followed by a series of lowercase letters [environ] followed by the mental in the first set of OR conditions and ending wit a " from the second set of OR conditions, but words like "cycle" are being highlighted, but not the entire word, just the cy part of that word is getting surrounded in the HTML bold and font tags; but this word should even qualify at all because the cy part of the word occurs with lowercase letters that follows (NOT in the second set of OR conditions as a means of avoiding words like this), but if it was a word I wanted to highlight I want the entire word encapsulated in the HTML tags and not just the cy, so I guess sed sees just the cy part of cycle as matching the expr.
I can make it work fine if I remove the first set of OR conditions: ance\|cable\|cy\|ence\|ery\|ious\|mental\|ment\|ne ss\|sion\|tional\|tion\|tive, and instead create a separate line in my sed file for them such as:
Code:
s/\( \|"\)\([A-Za-z][a-z]*ance\)\( \|,\|'\|"\|;\|:\|.\)/\1<b><font color="ff0000">\2<\/font><\/b>\3/g
s/\( \|"\)\([A-Za-z][a-z]*cable\)\( \|,\|'\|"\|;\|:\|.\)/\1<b><font color="ff0000">\2<\/font><\/b>\3/g
But I would really love to know why this oneliner does not work b/c I cannot see it.