Find the answer to your Linux question:
Results 1 to 5 of 5
Hi, I am working on transforming html code text into the .vert text format. I want to use linux utility sed. I have this regexp which should do the work: ...
  1. #1
    Just Joined!
    Join Date
    Jul 2008
    Posts
    3

    Replace space, that is not in html tags <> with new line using sed

    Hi, I am working on transforming html code text into the .vert text format. I want to use linux utility sed. I have this regexp which should do the work: s/ \(?![^<>]*>\)/\n/g. I use it like this with sed: echo "you <we try> there" | sed 's/ \(?![^<>]*>\)/\n/g' ... The demanded output should be:
    you
    <we try>
    there
    But I get the same string as on input. Is the regexp wrong? Or am I using sed incorrectly? Thanks for your help.

  2. #2
    Linux Newbie
    Join Date
    Jul 2008
    Posts
    181
    You might wish to refine this a bit, but it basically works:

    sed 's/</\n</g; s/>/>\n/g;' input_file

  3. #3
    Just Joined!
    Join Date
    Jul 2008
    Posts
    3
    Quote Originally Posted by burschik View Post
    You might wish to refine this a bit, but it basically works:

    sed 's/</\n</g; s/>/>\n/g;' input_file
    Damn, this works sometimes, doesn't work for case for example echo "you <we try> there here" |sed 's/</\n</g; s/>/>\n/g;' , the output is:
    you
    <we try>
    there here
    Any idea how to deal with this?
    Anyway, thanks for answer.

  4. #4
    Linux Newbie
    Join Date
    Jul 2008
    Posts
    181
    So what exactly do you need? Is every word supposed to be on a line of its own, except for text enclosed in angle brackets, or what?

    If you don't mind the empty lines, you could use this:

    sed 's/\(<[^>]\+>\| \)/&\n/g;'

    And if you do, you can use:

    sed 's/\(<[^>]\+>\| \)/&\n/g; s/\n \n/\n/g'
    Last edited by burschik; 07-29-2008 at 01:14 PM. Reason: addition

  5. #5
    Just Joined!
    Join Date
    Jul 2008
    Posts
    3
    Yes, every word is supposed to be on a new line, except those in <>. Thanks a lot for your help. This regexp does the work:

    Quote Originally Posted by burschik View Post
    sed 's/\(<[^>]\+>\| \)/&\n/g; s/\n \n/\n/g'
    Thanks again.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...