Find the answer to your Linux question:
Results 1 to 5 of 5
Hi, I need to extract som text from a text file. The text is a test log with system info at the top and results further down. What I need ...
  1. #1
    Just Joined!
    Join Date
    Jun 2010
    Posts
    2

    Add html formatting to text file

    Hi,

    I need to extract som text from a text file.

    The text is a test log with system info at the top and results further down.

    What I need is to add different tags with formatting before and after each line.

    I have prepared a template with html formatting, but the number of lines in the test log may be different from case to case, so I need to be able to add formatting tags by need.

    Can this be done using bash script, sed, awk, head, tail... ?

  2. #2
    Linux Engineer GNU-Fan's Avatar
    Join Date
    Mar 2008
    Posts
    935
    Yes, sed and gawk are very potent tools for text processing.
    In the unlikely event that they can't do the job, Perl or PHP are other options.
    Debian GNU/Linux -- You know you want it.

  3. #3
    Just Joined!
    Join Date
    Jun 2010
    Posts
    2
    This is my log file:

    PHP Code:
    ******************
    SYSTEM INFORMATION
    ******************
    Linux 2.6.27.27 i686


    Machine type
    :
    Network Nameslax
    Time
    Thu Mar 11 08:26:50 2010
    Operating system
    Linux (2.6.27.27)
    Number of CPUs2
    CPU manufacturer
    GenuineIntel
    CPU type
    Core2 Duo CPU     P8400  2.26GHz
    CPU features
    MMX SSE SSE2 SSE3
    CPU Serial 
    #: Not available or disabled
    CPU1 speed2266.6 MHz
    CPU2 speed
    2266.6 MHz
    CPU Level 2 Cache
    3072KB
    RAM
    2002400 KBytes
    Video card
    "Card0" ("Mobile Integrated Graphics Controller")
    Video resolution1280x1280x24

    ******************
    DETAILED EVENT LOG
    ******************
    INFORMATION2010-03-11 08:26:50StatusStarting test run
    INFORMATION
    2010-03-11 08:26:52StatusCompleted started test run
    INFORMATION
    2010-03-11 08:31:58StatusTest run stopped

    **************
    RESULT SUMMARY
    **************
    Test Start timeThu Mar 11 08:26:51 2010 
    Test Stop time
    Thu Mar 11 08:31:53 2010
    Test Duration
    000h 05m 02s 

                                  Test Name   Cycles   Operations      Result  Errors   Last Error
                                CPU 
    Maths   102      15.015 Billion  PASS    0        No errors
                               Memory 
    (RAM)   0        1.871 Billion   PASS    0        No errors
                                2D Graphics   40       166410          PASS    0        No errors
    Disk
    Hard Disk (/mnt/sda1) [/dev/sda1]   9        6.141 Billion   PASS    0        No errors
                  Network
    eth0 (127.0.0.1)   1803     14.430 Million  PASS    0        No errors
                           USB Plug 1 
    (2:2)   25       207 Million     PASS    0        No errors
                           USB Plug 2 
    (2:1)   25       207 Million     PASS    0        No errors
                           USB Plug 3 
    (1:2)   19       159 Million     PASS    0        No errors
                         Serial Port
    ttyS0   3        115500          PASS    0        No errors
                         Serial Port
    ttyS1   3        115500          PASS    0        No errors
    TEST RUN PASSED

    Notes
    :

    *********************
    SERIOUS ERROR SUMMARY
    *********************

    ----------------------------------------------------------------------------------------------------- 
    I need it to look more like this:

    PHP Code:
    <table width=640>
    <
    tr>
    <
    td width="25%"><strong>Customer</strong></td>
    <
    td width="75%"></td>
    </
    tr>
    <
    tr>
    <
    td width="25%"><strong>Report Date</strong></td>
    <
    td width="75%">12/23/09</td>
    </
    tr>
    <
    tr>
    <
    td width="25%"><strong>Technician</strong></td>
    <
    td width="75%"></td>
    </
    tr>
    <
    tr>
    <
    td width="25%"><strong>Generated by</strong></td>
    <
    td width="75%">BurnInTest Version V5.3 Pro</td>
    </
    tr>
    </
    table>
    <
    h2>System summary</h2>
    <
    table width=640>
    <
    tr><td class="header">System component</td><td class="header">Description</td></tr>
    <
    tr>
    <
    td width="25%" class="value"><strong>Computer Name</strong></td>
    <
    td width="75%" class="altvalue">HD</td>
    </
    tr>
    <
    tr>
    <
    td width="25%" class="value"><strong>Machine type</strong></td>
    <
    td width="75%" class="altvalue"></td>
    </
    tr>
    <
    tr>
    <
    td width="25%" class="value"><strong>Machine serial #</strong></td>
    <td width="75%" class="altvalue">123</td>
    </
    tr>
    <
    tr>
    <
    td width="25%" class="value"><strong>Operating system</strong></td>
    <
    td width="75%" class="altvalue">Windows XP Professional  Service Pack 2 build 2600</td>
    </
    tr>
    <
    tr>
    <
    td width="25%" class="value"><strong>CPU type</strong></td>
    <
    td width="75%" class="altvalue">Intel(RCore(TM)2 Duo CPU     P8400  2.26GHz (2266.6 MHz)</td>
    </
    tr>
    <
    tr>
    <
    td width="25%" class="value"><strong>RAM</strong></td>
    <
    td width="75%" class="altvalue">1980 MB</td>
    </
    tr>
    <
    tr>
    <
    td width="25%" class="value"><strong>Video card</strong></td>
    <
    td width="75%" class="altvalue">Intel Corporation GM45/GS45/GL40 Embedded Graphic (Resolution800x600x16)</td>
    </
    tr>
    <
    tr>
    <
    td width="25%" class="value"><strong>Disk drive</strong></td>
    <
    td width="75%" class="altvalue">Model TS32GSSD25S-(Size29.8GB)</td>
    </
    tr>
    </
    table>
    &
    nbsp
    <h2>Result summary</h2><table width=640>
    <
    tr>
    <
    td width="25%" class="value"><strong>Test Start time</strong></td>
    <
    td width="75%" class="altvalue">Tue Dec 22 17:14:11 2009 
    </td>
    </
    tr>
    <
    tr>
    <
    td width="25%" class="value"><strong>Test Stop time</strong></td>
    <
    td width="75%" class="altvalue">Tue Dec 22 19:14:15 2009
    </td>
    </
    tr>
    <
    tr>
    <
    td width="25%" class="value"><strong>Test Duration</strong></td>
    <
    td width="75%" class="altvalue">002h 00m 04s 
    </td>
    </
    tr>
    </
    table>
    <
    table width=640>
    <
    tr>
    <
    td width="25%" class="header"><strong>Test</strong></td>
    <
    td width="75%" class="header"><strong>Result</strong></td>
    </
    tr>
    <
    tr>
    <
    td width="25%" class="value"><strong>       CPU Maths</strong></td>
    <
    td width="75%" class="altvalue">PASS</td>
    </
    tr>
    <
    tr>
    <
    td width="25%" class="value"><strong>        CPU SIMD</strong></td>
    <
    td width="75%" class="altvalue">PASS</td>
    </
    tr>
    <
    tr>
    <
    td width="25%" class="value"><strong>      Memory (RAM)</strong></td>
    <
    td width="75%" class="altvalue">PASS</td>
    </
    tr>
    <
    tr>
    <
    td width="25%" class="value"><strong>       2D Graphics</strong></td>
    <
    td width="75%" class="altvalue">PASS</td>
    </
    tr>
    <
    tr>
    <
    td width="25%" class="value"><strong>        Disk (C: )</strong></td>
    <
    td width="75%" class="altvalue">PASS</td>
    </
    tr>
    <
    tr>
    <
    td width="25%" class="value"><strong>         Network 1</strong></td>
    <
    td width="75%" class="altvalue">PASS</td>
    </
    tr>
    <
    tr>
    <
    td width="25%" class="value"><strong>         Network 2</strong></td>
    <
    td width="75%" class="altvalue">PASS</td>
    </
    tr>
    <
    tr>
    <
    td width="25%" class="value"><strong>        USB Plug 1</strong></td>
    <
    td width="75%" class="altvalue">PASS</td>
    </
    tr>
    <
    tr>
    <
    td width="25%" class="value"><strong>        USB Plug 2</strong></td>
    <
    td width="75%" class="altvalue">PASS</td>
    </
    tr>
    <
    tr>
    <
    td width="25%" class="value"><strong>        USB Plug 3</strong></td>
    <
    td width="75%" class="altvalue">PASS</td>
    </
    tr>
    <
    tr>
    <
    td width="25%" class="value"><strong>        USB Plug 4</strong></td>
    <
    td width="75%" class="altvalue">PASS</td>
    </
    tr>
    <
    tr>
    <
    td width="25%" class="value"><strong>     Serial Port 1</strong></td>
    <
    td width="75%" class="altvalue">PASS</td>
    </
    tr>
    <
    tr>
    <
    td width="25%" class="value"><strong>     Serial Port 2</strong></td>
    <
    td width="75%" class="altvalue">PASS</td>
    </
    tr>
    </
    table>
    <
    table width=640>
    <
    tr>
    <
    td class="passvalue"><strong>TEST RUN PASSED</strong></td>
    </
    tr>
    </
    table>
    <
    table width=640>
    <
    tr>
    <
    td width="25%" class="value"><strong>Notes</strong></td>
    <
    td width="75%" class="altvalue"></td>
    </
    tr>
    </
    table
    What is the best approach to get this result?

  4. #4
    Linux Engineer GNU-Fan's Avatar
    Join Date
    Mar 2008
    Posts
    935
    gawk, Perl, PHP, C, ...

    whatever you have most experience with.

    Look for the left hand side words as keywords, e.g. "Video card:" and parse the rest of the line. Then you have both columns and can put them as table data.
    For things like CPU you must take into account that they may be more than one. So keep counting: Is there a CPU1? Is there a CPU2? ...
    Last edited by GNU-Fan; 07-01-2010 at 06:06 AM.
    Debian GNU/Linux -- You know you want it.

  5. #5
    Linux Enthusiast Kloschüssel's Avatar
    Join Date
    Oct 2005
    Location
    Italy
    Posts
    717
    If you can, you may generate the data in a XML format and XSL translate it to whatever you want.

    The XML may look like:

    Code:
    <?xml version="1.0" encoding="iso-8859-1"?>
    <dataset>
    <data>
    <key>CPU temp</key>
    <value>45°</value>
    </data>
    </dataset>
    A xslt stylesheet (very easy):
    Code:
    <?xml version="1.0" encoding="iso-8859-1"?>
    <xsl:stylesheet
        version="1.0"
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:html="http://www.w3.org/1999/xhtml"
        xmlns="http://www.w3.org/1999/xhtml"
        exclude-result-prefixes="html"
    >
     
        <xsl:output
            method="xml"
            doctype-system="http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"
            doctype-public="-//W3C//DTD XHTML 1.1//EN"
        />
     
        <xsl:template match="dataset">
    <html>
    <body>
            <table>
            <xsl:apply-templates/>
            </table>
    </body>
    </html>
        </xsl:template>
     
        <xsl:template match="data">
            <tr>
               <td><xsl:value-of select="key"/></td>
               <td><xsl:value-of select="value"/></td>
            </tr>
        </xsl:template>
    </xsl:stylesheet>
    Produces something like:
    Code:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
    <html>
    <body>
    <table>
    <tr>
    <td>CPU temp</td>
    <td>45°</td>
    </tr>
    </table>
    </body>
    </html>
    This is the fastest and likely the easiest way (it took me about 3 minutes to write this) because you don't have to parse data by yourself, but just let XPath do the job. Furthermore I can propose multiple paths to you and you may go either way you want:

    1] XML =xslt=> HTML
    2] XML =xslt=> FO =fop=> html, pdf, ..
    3] XML =xslt=> docbook =xslt=> FO =fop=> html, pdf, ..

    To clarify some things:
    xslt = XSL transformation of XML data using a XSL stylesheet
    fop = a FO processor that renders FO (format objects) to a target format (Apache FOP is one implementation of this standard)

    1 is the fastest approach that produces your HTML as you want it and I illustrated the process in the example above.

    2 adds an abstraction layer (FO = format objects) that can be rendered to various different output formats (pdf, xml, text, html, svg, png, ..) with a FO processor.

    3 adds a further abstraction layer that lets you generate FO easier. The FO definitions are large and support all kind of stuff, but it is complicated. I personally like much more the docbook approach, which represents a book / paper or something else in a logical XML structure that can itself XSL transformed to FO (the stylesheet is open source and can be used just like any other XSL stylesheet). You can even change the look and feel of it by including the original stylesheet and changing some parameters like colors.

    Now a work estimation (that includes learning the technology):

    1] 2-8 hours
    2] 5-15 hours
    3] 3-10 hours

    The choice is yours and the rest can be looked up with a search engine.
    Last edited by Kloschüssel; 07-01-2010 at 06:41 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...