Results 1 to 3 of 3
His,
i have a text file, produced by pdfinfo, that sometimes has "special" characters and then an editor like gedit or medit complains: "Could not detect file character encoding."
jEdit ...
- 11-10-2009 #1
filter special characters
His,
i have a text file, produced by pdfinfo, that sometimes has "special" characters and then an editor like gedit or medit complains: "Could not detect file character encoding."
jEdit can open the file, so I'm not totally in the "cold"...
if i use the cat -v command i can see that one of the special characters that is making a problem is ^@ (i think its the null from ASCII, or \0)
but this doesn't solve my problem of translating the file because useful characters like the Ü get translated into M-CM-^\
i also tried:
and:Code:$ iconv -c file.txt -o out.txt
and:Code:iconv -c -f ISO8859-1 file.txt -t UTF-8 -o out.txt
but these also didn't work out.Code:dos2unix -bv file.txt
How do I get rid of, or filter, special characters??
- 11-11-2009 #2Linux User
- Join Date
- Jan 2007
- Location
- cleveland
- Posts
- 452
suggest you try "tr"
among other things, given a string of characters,
it will delete them from a file
tr -d "t w" <file
deletes each 't' and each 'w' from the file. Special
characters may also be represented in octalthe sun is new every day (heraclitus)
- 11-11-2009 #3
thanks tpl!
that did the trick in my case:
i still wish linux had a more general tool capable of handling this problem...Code:cat bad_file.txt | tr -d "\0" > filtered.txt


Reply With Quote