Find the answer to your Linux question:
Page 2 of 4 FirstFirst 1 2 3 4 LastLast
Results 11 to 20 of 33
Originally Posted by eraker Well, to get rid of the end-time number, you could try something like this: Code: sed -e 's/--[0-9]*:[0-9]*\s/ /g' But that doesn't solve the line-spacing problems. ...
  1. #11
    Just Joined!
    Join Date
    Oct 2006
    Posts
    29
    Quote Originally Posted by eraker
    Well, to get rid of the end-time number, you could try something like this:
    Code:
    sed -e 's/--[0-9]*:[0-9]*\s/ /g'
    But that doesn't solve the line-spacing problems. In addition, I imagine that someone might have a much better suggestion than that solution, which strikes me as inelegant.
    This actually did all I need. It removed ending time and there is no double spacing in output file.
    @muha I tried what you wrote but it didn't seem to have any effect on a file. Maybe I did something wrong but it doesn't matter. Thank you both a lot for taking time to help me.

  2. #12
    Just Joined!
    Join Date
    Oct 2006
    Posts
    29
    Sorry to bother you again but I just ran into something I could use help with.
    Some of providers are not available with xmltv tool so I would like to use them from html pages I have.

    But using this on some of them leaves somewhat of disorder in file:
    Code:
    sed -e :a -e 's/<[^>]*>//g;/</N;//ba'
    In some of them I get something like:
    Code:
    22:00
    
    Show title
    
    
    
    
    show description
    Leaving me with 4 or 5 empty lines and then it turns out really bad in rss reader. So I would appreciate a way to delete all empty lines.
    I tried this suggested on site posted earlier:
    Code:
     # delete ALL blank lines from a file (same as "grep '.' ")
     sed '/^$/d'                           # method 1
     sed '/./!d'                           # method 2
    but that didn't do anything.
    And one more thing, with one file I don't have this empty lines problem. I get it close to perfect expect that there is no space after time:
    Code:
    22:00The Godfather
    so it would be good if I could create one empty space after the time.

  3. #13
    Linux User
    Join Date
    Aug 2005
    Posts
    408
    Wait, are you running each of the sed commands so far one at a time? Can you post a big chunk of what the text looks like originally and then say exactly what you want to do with it? For instance, you want to remove all html/xml tags, then remove the end time, then remove extra spaces? I think we can cobble something together from the answers given already that will work.

    Don't worry about posting back, I like these problems because they increase my understanding of sed. These are like interesting puzzles to me, and I'm only a beginner so I like the opportunity to learn more.

  4. #14
    Just Joined!
    Join Date
    Oct 2006
    Posts
    29
    What I asked for in last post is not related to previous posts in which I wanted to remove end time and double space. That is for files I can get in txt form. And that is now all covered thx to your help except that I have to write long scripts to wrap those txt files in xml tags and have channel name inside <title> tags of files. So I have to write 2,3 lines for each file (channel). But I don't mind doing that.
    But for files I don't have available in txt form I'd like to convert html pages to rss feeds. I've got close to the end of that process.
    I run serious of commands in a script one by one. Like the one you suggested:
    Code:
    sed -e :a -e 's/<[^>]*>//g;/</N;//ba'
    Anyway I get to the point where I have a valid rss feed with channel listings in between <descrpition> tags. And that is all good. But the problem is nowthat what is inside description tags has a lot of empty lines in one case:
    Code:
    06:15
    
    CYBER SEDUCTION : HIS SECRET LIFE
    
    
    MOVIE Drama, Family
    
    
    
    07:45
    
    Ĺ! ENTERTAINMENT
    
    
    MAGAZINO Fashion, Lifestyle, Music
    
    
    
    08:30
    
    THE EXECUTIONER
    
    
    MOVIE Thriller
    And it comes out looking very bad in this rss reader.
    What I would like to get is:
    Code:
    06:15 
    CYBER SEDUCTION : HIS SECRET LIFE
    MOVIE Drama, Family
    07:45
    Ĺ! ENTERTAINMENT
    MAGAZINO Fashion, Lifestyle, Music
    08:30 
    THE EXECUTIONER
    MOVIE Thriller
    thats what I meant by removing all empty lines.

    And in other case I have as a result:

    Code:
    09:30Guinnessův svět rekordů .
    10:00Zpr&#225;vy.
    10:05Lekce života .
    11:40Osudov&#233; okamžiky .
    Which looks bad because there is no space after the time. So I would need there one space after the time to get:

    Code:
    09:30 Guinnessův svět rekordů .
    10:00 Zpr&#225;vy.
    10:05 Lekce života .
    11:40 Osudov&#233; okamžiky .
    Edit:
    Just remembered there is another one. Since these html pages are differently formed output of this command:
    Code:
    sed -e :a -e 's/<[^>]*>//g;/</N;//ba'
    varies from file to file. And in this last one I get:

    Code:
    07:45Doom09:30Harry Potter ve AteĹź Kadehi12:05Harry Potter
    so there I could use something to break line before every time line and if possible to add a space after the time to get
    Code:
    07:45 Doom
    09:30 Harry Potter ve AteĹź Kadehi
    12:05 Harry Potter
    I am not sure if I am using the right terms but I guess you can get the point.

  5. #15
    Linux User
    Join Date
    Aug 2005
    Posts
    408
    Well, in those last two examples, this should suffice:
    Code:
    sed -e 's/--[0-9]*:[0-9]*\s/ /g'
    because it gets rid of all extra spaces and inserts only one between the time and the title. As for the others, it's hard without seeing what the original files looks like, to give you a substitution script that will work on all of the original files. You say they all look different, but we have to look at the raw original in order to make something that will work for all varieties. Also, this might get more complicated than sed pretty soon.

  6. #16
    drl
    drl is offline
    Linux Engineer drl's Avatar
    Join Date
    Apr 2006
    Location
    Saint Paul, MN, USA / CentOS, Debian, Solaris, SuSE
    Posts
    1,116

    Delete blank lines

    Hi.

    Deleting apparently blank lines requires some work. The lines that seem blank might have spaces or TABS. Here is what I use:
    Code:
    #!/bin/sh
    
    # @(#) dbl      Delete blank ( empty ) lines.
    # $Id: dbl,v 1.1 2006/11/22 21:02:07 drl Exp drl $
    
    grep -v '^[[:space:]]*$' $*
    Displaying a data file data1, and the result of "dbl data1" using "cat -tvn":
    Code:
         1  non-empty
         2  next line 5 spaces
         3
         4  next line 2 TABS
         5  ^I^I
         6  next line space TAB
         7   ^I
    
         1  non-empty
         2  next line 5 spaces
         3  next line 2 TABS
         4  next line space TAB
    Best wishes ... cheers, drl
    Welcome - get the most out of the forum by reading forum basics and guidelines: click here.
    90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
    We look forward to helping you with the challenge of the other 10%.
    ( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )

  7. #17
    Just Joined!
    Join Date
    Oct 2006
    Posts
    29
    Quote Originally Posted by eraker
    Well, in those last two examples, this should suffice:
    Code:
    sed -e 's/--[0-9]*:[0-9]*\s/ /g'
    because it gets rid of all extra spaces and inserts only one between the time and the title. As for the others, it's hard without seeing what the original files looks like, to give you a substitution script that will work on all of the original files. You say they all look different, but we have to look at the raw original in order to make something that will work for all varieties. Also, this might get more complicated than sed pretty soon.
    Yes you are right but I don't really need something that would work universally on all files because there is only 3 of them. And I am gonna use a separate script for each of them anyway.
    @drl, thx a lot that really did great and removed all blank lines and gave me the result I needed.
    That leaves 2 of them:
    Code:
    10:00Zpr&#225;vy.
    10:05Lekce života .
    adding empty space after time in this one. But it is not that important since it still looks decent. Although it would look nicer if it had that space
    And breaking the line before the time in this one:
    Code:
    07:45Doom09:30Harry Potter ve AteĹź Kadehi12:05Harry Potter
    This:
    Code:
    sed -e 's/--[0-9]*:[0-9]*\s/ /g'
    didn't seem to have any effect on above stated 2 examples.

  8. #18
    Linux User
    Join Date
    Aug 2005
    Posts
    408
    Oh. Now I understand. We can figure something out for the last two problems, I think. Are you learning anything about sed?

  9. #19
    Linux User
    Join Date
    Aug 2005
    Posts
    408
    For text files that look like this:
    Code:
    10:00Zprávy.
    10:05Lekce života
    I made a sed one-liner that looks like this:
    Code:
    sed -e 's/\(:[0-9]\{2\}\)\([a-Z]*\)/\1 \2/g' filename
    What I told it to do is "capture the colon and any 2 numbers":
    \(:[0-9]\{2\}\)

    Then I told it to capture any letters immediately following those two numbers:
    \([a-Z]*\)

    And then I told it to print \1 (what it captured first) followed by a space then \2 (what it captured second). Again, this looks pretty clumsy to me. I'll bet it would look a lot nicer in perl.

  10. #20
    Linux User
    Join Date
    Aug 2005
    Posts
    408
    Okay, for your last problem, if you have a file that has entries entirely like this:
    Code:
    07:45Doom09:30Harry Potter ve AteĹź Kadehi12:05Harry Potter
    Then (incorporating what I did above), the following one-liner should give you a line break as well as a space before the title before every time listed:

    Code:
    sed -e 's/\([0-9]\{2\}:[0-9]\{2\}\)\([a-Z]*\)/\n\1 \2/g' filename
    You can then pipe this through drl's line-deleting script above to get rid of any extra blank lines.

    It strikes me that this is probably an excessive use of sed, but I really like it for some reason.

Page 2 of 4 FirstFirst 1 2 3 4 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...