Results 1 to 10 of 10
I am trying to remove all new line characters from a document downloaded from wget, but every time it reads a <br> tag, it is unable to remove the new ...
- 09-21-2011 #1Just Joined!
- Join Date
- Jun 2011
- Posts
- 16
Problem removing new lines
I am trying to remove all new line characters from a document downloaded from wget, but every time it reads a <br> tag, it is unable to remove the new line character. I have tried removing the <br> tags before removing the new lines using these methods:
still didn't work. please help, sos,PHP Code:cat days.txt | tr -d '\n'
cat days.txt | awk '{ printf "%s", $0 }'
cat days.txt | sed ':a;N;$!ba;s/\n//g'
- 09-21-2011 #2Just Joined!
- Join Date
- Aug 2011
- Posts
- 48
I'm not at my computer to test it but try:
awk -F\n '{print $1}' days.txt
- 09-21-2011 #3
- 09-21-2011 #4Just Joined!
- Join Date
- Sep 2007
- Posts
- 4
Can you post the result of:
[user@localhost]$ file days.txt
- 09-21-2011 #5Just Joined!
- Join Date
- Aug 2011
- Posts
- 48
You don't have a dos2unix issue? 0xa vs 0xb 0xa
- 09-21-2011 #6Just Joined!
- Join Date
- Jun 2011
- Posts
- 16
it's acctually not days.txt, it's yanswer and "file ~/yanswer" returned this:
yanswer: HTML document text
i don't believe it's a dos2unix issue, i think it's just something invisible that get's added when there is a <br> present in the code of the file.
- 09-21-2011 #7Just Joined!
- Join Date
- Aug 2011
- Posts
- 48
Use od -c -A x -tx1 yanswer to see if there is something there
- 09-21-2011 #8Just Joined!
- Join Date
- Jun 2011
- Posts
- 16
it returned a very long list of characters, what exactly am i looking for?
- 09-21-2011 #9Just Joined!
- Join Date
- Aug 2011
- Posts
- 48
Look for the <br> and see what values are after it. For linux the '\n' is a 0x0a and windows its 0x0a 0x0d.
So you are just checking to see that the '\n' is just char.
You should see something like:
0000: ... 3C 62 72 3E ?? ?? <- Where the ?? are what is coming after the <br> tab.
< b r >
- 09-21-2011 #10Just Joined!
- Join Date
- Jun 2011
- Posts
- 16
ty, it seems there is an extra character "\r" that was causing the line to end and a new one to begin. simple sed command removed it and all is well, thanks for the help guys


Reply With Quote
