Results 11 to 20 of 33
Originally Posted by eraker
Well, to get rid of the end-time number, you could try something like this:
Code:
sed -e 's/--[0-9]*:[0-9]*\s/ /g'
But that doesn't solve the line-spacing problems. ...
- 11-22-2006 #11Just Joined!
- Join Date
- Oct 2006
- Posts
- 29
This actually did all I need. It removed ending time and there is no double spacing in output file.
Originally Posted by eraker
@muha I tried what you wrote but it didn't seem to have any effect on a file. Maybe I did something wrong but it doesn't matter. Thank you both a lot for taking time to help me.
- 11-22-2006 #12Just Joined!
- Join Date
- Oct 2006
- Posts
- 29
Sorry to bother you again but I just ran into something I could use help with.
Some of providers are not available with xmltv tool so I would like to use them from html pages I have.
But using this on some of them leaves somewhat of disorder in file:
In some of them I get something like:Code:sed -e :a -e 's/<[^>]*>//g;/</N;//ba'
Leaving me with 4 or 5 empty lines and then it turns out really bad in rss reader. So I would appreciate a way to delete all empty lines.Code:22:00 Show title show description
I tried this suggested on site posted earlier:
but that didn't do anything.Code:# delete ALL blank lines from a file (same as "grep '.' ") sed '/^$/d' # method 1 sed '/./!d' # method 2
And one more thing, with one file I don't have this empty lines problem. I get it close to perfect expect that there is no space after time:
so it would be good if I could create one empty space after the time.Code:22:00The Godfather
- 11-22-2006 #13Linux User
- Join Date
- Aug 2005
- Posts
- 408
Wait, are you running each of the sed commands so far one at a time? Can you post a big chunk of what the text looks like originally and then say exactly what you want to do with it? For instance, you want to remove all html/xml tags, then remove the end time, then remove extra spaces? I think we can cobble something together from the answers given already that will work.
Don't worry about posting back, I like these problems because they increase my understanding of sed. These are like interesting puzzles to me, and I'm only a beginner so I like the opportunity to learn more.
- 11-22-2006 #14Just Joined!
- Join Date
- Oct 2006
- Posts
- 29
What I asked for in last post is not related to previous posts in which I wanted to remove end time and double space. That is for files I can get in txt form. And that is now all covered thx to your help except that I have to write long scripts to wrap those txt files in xml tags and have channel name inside <title> tags of files. So I have to write 2,3 lines for each file (channel). But I don't mind doing that.
But for files I don't have available in txt form I'd like to convert html pages to rss feeds. I've got close to the end of that process.
I run serious of commands in a script one by one. Like the one you suggested:
Anyway I get to the point where I have a valid rss feed with channel listings in between <descrpition> tags. And that is all good. But the problem is nowthat what is inside description tags has a lot of empty lines in one case:Code:sed -e :a -e 's/<[^>]*>//g;/</N;//ba'
And it comes out looking very bad in this rss reader.Code:06:15 CYBER SEDUCTION : HIS SECRET LIFE MOVIE Drama, Family 07:45 Ĺ! ENTERTAINMENT MAGAZINO Fashion, Lifestyle, Music 08:30 THE EXECUTIONER MOVIE Thriller
What I would like to get is:
thats what I meant by removing all empty lines.Code:06:15 CYBER SEDUCTION : HIS SECRET LIFE MOVIE Drama, Family 07:45 Ĺ! ENTERTAINMENT MAGAZINO Fashion, Lifestyle, Music 08:30 THE EXECUTIONER MOVIE Thriller
And in other case I have as a result:
Which looks bad because there is no space after the time. So I would need there one space after the time to get:Code:09:30Guinnessův svět rekordů . 10:00Zprávy. 10:05Lekce života . 11:40Osudové okamžiky .
Edit:Code:09:30 Guinnessův svět rekordů . 10:00 Zprávy. 10:05 Lekce života . 11:40 Osudové okamžiky .
Just remembered there is another one. Since these html pages are differently formed output of this command:
varies from file to file. And in this last one I get:Code:sed -e :a -e 's/<[^>]*>//g;/</N;//ba'
so there I could use something to break line before every time line and if possible to add a space after the time to getCode:07:45Doom09:30Harry Potter ve AteĹź Kadehi12:05Harry Potter
I am not sure if I am using the right terms but I guess you can get the point.Code:07:45 Doom 09:30 Harry Potter ve AteĹź Kadehi 12:05 Harry Potter
- 11-22-2006 #15Linux User
- Join Date
- Aug 2005
- Posts
- 408
Well, in those last two examples, this should suffice:
because it gets rid of all extra spaces and inserts only one between the time and the title. As for the others, it's hard without seeing what the original files looks like, to give you a substitution script that will work on all of the original files. You say they all look different, but we have to look at the raw original in order to make something that will work for all varieties. Also, this might get more complicated than sed pretty soon.Code:sed -e 's/--[0-9]*:[0-9]*\s/ /g'
- 11-22-2006 #16Linux Engineer
- Join Date
- Apr 2006
- Location
- Saint Paul, MN, USA / CentOS, Debian, Solaris, SuSE
- Posts
- 1,116
Delete blank lines
Hi.
Deleting apparently blank lines requires some work. The lines that seem blank might have spaces or TABS. Here is what I use:
Displaying a data file data1, and the result of "dbl data1" using "cat -tvn":Code:#!/bin/sh # @(#) dbl Delete blank ( empty ) lines. # $Id: dbl,v 1.1 2006/11/22 21:02:07 drl Exp drl $ grep -v '^[[:space:]]*$' $*
Best wishes ... cheers, drlCode:1 non-empty 2 next line 5 spaces 3 4 next line 2 TABS 5 ^I^I 6 next line space TAB 7 ^I 1 non-empty 2 next line 5 spaces 3 next line 2 TABS 4 next line space TABWelcome - get the most out of the forum by reading forum basics and guidelines: click here.
90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
We look forward to helping you with the challenge of the other 10%.
( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )
- 11-22-2006 #17Just Joined!
- Join Date
- Oct 2006
- Posts
- 29
Yes you are right but I don't really need something that would work universally on all files because there is only 3 of them. And I am gonna use a separate script for each of them anyway.
Originally Posted by eraker
@drl, thx a lot that really did great and removed all blank lines and gave me the result I needed.
That leaves 2 of them:
adding empty space after time in this one. But it is not that important since it still looks decent. Although it would look nicer if it had that spaceCode:10:00Zprávy. 10:05Lekce života .

And breaking the line before the time in this one:
This:Code:07:45Doom09:30Harry Potter ve AteĹź Kadehi12:05Harry Potter
didn't seem to have any effect on above stated 2 examples.Code:sed -e 's/--[0-9]*:[0-9]*\s/ /g'
- 11-22-2006 #18Linux User
- Join Date
- Aug 2005
- Posts
- 408
Oh. Now I understand. We can figure something out for the last two problems, I think. Are you learning anything about sed?
- 11-22-2006 #19Linux User
- Join Date
- Aug 2005
- Posts
- 408
For text files that look like this:
I made a sed one-liner that looks like this:Code:10:00Zprávy. 10:05Lekce života
What I told it to do is "capture the colon and any 2 numbers":Code:sed -e 's/\(:[0-9]\{2\}\)\([a-Z]*\)/\1 \2/g' filename
\(:[0-9]\{2\}\)
Then I told it to capture any letters immediately following those two numbers:
\([a-Z]*\)
And then I told it to print \1 (what it captured first) followed by a space then \2 (what it captured second). Again, this looks pretty clumsy to me. I'll bet it would look a lot nicer in perl.
- 11-22-2006 #20Linux User
- Join Date
- Aug 2005
- Posts
- 408
Okay, for your last problem, if you have a file that has entries entirely like this:
Then (incorporating what I did above), the following one-liner should give you a line break as well as a space before the title before every time listed:Code:07:45Doom09:30Harry Potter ve AteĹź Kadehi12:05Harry Potter
You can then pipe this through drl's line-deleting script above to get rid of any extra blank lines.Code:sed -e 's/\([0-9]\{2\}:[0-9]\{2\}\)\([a-Z]*\)/\n\1 \2/g' filename
It strikes me that this is probably an excessive use of sed, but I really like it for some reason.


Reply With Quote
