Find the answer to your Linux question:
Results 1 to 3 of 3
Hey, I have a server filled with HTML files. I need to change the navigation bar to add an additional category. This is an example of the code that I ...
  1. #1
    Just Joined!
    Join Date
    Jan 2006
    Posts
    17

    Global Find and Replace with find and sed

    Hey, I have a server filled with HTML files. I need to change the navigation bar to add an additional category. This is an example of the code that I would like to change

    <ul>
    <li><a href="page1.html">One</a></li>
    <li><a href="page2.html">Two</a></li>
    <li><a href="page3.html">Three</a></li>
    <li><a href="page4.html">Four</a></li>
    </ul>
    Now, I would like to replace the link to page2 with two pictures instead. So it would look like:

    <ul>
    <li><a href="page1.html">One</a></li>
    <li>
    <img src="pic1.jpg"><br />
    <img src="pic2.jpg">
    </li>
    <li><a href="page3.html">Three</a></li>
    <li><a href="page4.html">Four</a></li>
    </ul>
    I have made a shell script that does the job (below), but I still have some questions.

    #!/bin/bash
    find -name '*.html' -print -exec sed -i.bak 's:<a href="page2\.html">Two</a>:<li><img src="pic1.jpg"><br /><img src="pic2.jpg"></li>:g' {} \;
    If you execute the shell script, the resulting code will do its job, but its very hard to read:

    <ul>
    <li><a href="page1.html">One</a></li>
    <li><img src="pic1.jpg"><br /><img src="pic2.jpg"></li>
    <li><a href="page3.html">Three</a></li>
    <li><a href="page4.html">Four</a></li>
    </ul>
    Is there a way that I can add tabs and line breaks to the output for sed so that it would be easier to read? Also, referring back to the original html file, how would you replcae page two and three with the two images in one command? so that it would look like (I can't figure out how to include spaces and tabs in regex. [/s]*? does not work):

    <ul>
    <li><a href="page1.html">One</a></li>
    <li>
    <img src="pic1.jpg"><br />
    <img src="pic2.jpg">
    </li>
    <li><a href="page4.html">Four</a></li>
    </ul>
    Thanks for the help. Cheers.

  2. #2
    Linux Enthusiast
    Join Date
    Aug 2006
    Posts
    631
    Adjust the sed command with something like this, note that the command must be on 4 separate lines:
    Code:
    sed 's:<a href="page2\.html">Two</a>:\
    <img src="pic1.jpg"><br />\
    <img src="pic2.jpg">\
    :g' file
    This is the output with your sample file:

    Code:
    $ cat file
    <ul>
    <li><a href="page1.html">One</a></li>
    <li><a href="page2.html">Two</a></li>
    <li><a href="page3.html">Three</a></li>
    <li><a href="page4.html">Four</a></li>
    </ul>
    $ sed 's:<a href="page2\.html">Two</a>:\
    <img src="pic1.jpg"><br />\
    <img src="pic2.jpg">\
    :g' file
    <ul>
    <li><a href="page1.html">One</a></li>
    <li>
    <img src="pic1.jpg"><br />
    <img src="pic2.jpg">
    </li>
    <li><a href="page3.html">Three</a></li>
    <li><a href="page4.html">Four</a></li>
    </ul>$

  3. #3
    drl
    drl is offline
    Linux Engineer drl's Avatar
    Join Date
    Apr 2006
    Location
    Saint Paul, MN, USA / CentOS, Debian, Solaris, SuSE
    Posts
    1,117
    Hi.

    If I were interested in readability, I would use HTML tidy:
    Code:
    #!/usr/bin/env bash
    
    # @(#) s3	Demonstrate html tidy.
    # See: http://tidy.sourceforge.net
    
    FILE=${1-t1.html}
    echo
    tidy --version
    
    echo
    echo " Contents of file $FILE:"
    cat $FILE
    
    echo
    tidy -q -i --show-body-only yes $FILE
    
    exit 0
    producing:
    Code:
    % ./s3
    
    HTML Tidy for Linux released on 6 November 2007
    
     Contents of file t1.html:
    <ul>
    <li><a href="page1.html">One</a></li>
    <li><img src="pic1.jpg"><br /><img src="pic2.jpg"></li>
    <li><a href="page3.html">Three</a></li>
    <li><a href="page4.html">Four</a></li>
    </ul> 
    
    line 1 column 1 - Warning: missing <!DOCTYPE> declaration
    line 1 column 1 - Warning: inserting implicit <body>
    line 1 column 1 - Warning: inserting missing 'title' element
    line 3 column 5 - Warning: <img> lacks "alt" attribute
    line 3 column 31 - Warning: <img> lacks "alt" attribute
    <ul>
      <li><a href="page1.html">One</a></li>
    
      <li><img src="pic1.jpg"><br>
      <img src="pic2.jpg"></li>
    
      <li><a href="page3.html">Three</a></li>
    
      <li><a href="page4.html">Four</a></li>
    </ul>
    Best wishes ... cheers, drl
    Welcome - get the most out of the forum by reading forum basics and guidelines: click here.
    90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
    We look forward to helping you with the challenge of the other 10%.
    ( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...