Find the answer to your Linux question:
Results 1 to 8 of 8
how to remove undesired characters(ascii) only first column in huge the file? would you please tell what is wrong my testing command. EXAMPLE FILE 201208|123456|US|CA 201208|23457|US|CA o201208|258741|US|TX 201208|123458|US|TX 201208|2851452|CA|TN EXPECT ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Oct 2012
    Posts
    24

    how to remove undesired characters(ascii) first column in the file


    how to remove undesired characters(ascii) only first column in huge the file? would you please tell what is wrong my testing command.

    EXAMPLE FILE
    201208|123456|US|CA
    201208|23457|US|CA
    o201208|258741|US|TX
    201208|123458|US|TX
    201208|2851452|CA|TN

    EXPECT FILE OUTPUT

    201208|123456|US|CA
    201208|23457|US|CA
    201208|258741|US|TX
    201208|123458|US|TX
    201208|2851452|CA|TN


    I did as follows. does not work;

    tr -cd '\11\12\40-\176' < testfile.txt> resultfile.txt

    or

    perl -ane '{ if(m/[[:^ascii:]]/) { print } }'>resultfile.txt

  2. #2
    tpl
    tpl is offline
    Linux User
    Join Date
    Jan 2007
    Location
    cleveland
    Posts
    477
    welcome to the forum

    don't know what those weird characters are, but using "od -c" to find their octal codes, and using 'tr" as you did--this seems to
    work on your sample

    tr -d "\302\247\303\202\302\242\o" <example_file
    the sun is new every day (heraclitus)

  3. #3
    Trusted Penguin
    Join Date
    May 2011
    Posts
    4,353
    Quote Originally Posted by jimmymj View Post
    how to remove undesired characters(ascii) only first column in huge the file? would you please tell what is wrong my testing command.
    Hello and welcome!

    As well as perl and tr, you could use sed, too, e.g.:
    Code:
    cat input.txt|sed -e 's|^[^0-9]||' > output.txt
    that sed expression says "any line in which the first character is NOT a digit will have that character deleted". not elegant, but does the job.

  4. #4
    Just Joined!
    Join Date
    Oct 2012
    Posts
    24
    Quote Originally Posted by tpl View Post
    welcome to the forum

    don't know what those weird characters are, but using "od -c" to find their octal codes, and using 'tr" as you did--this seems to
    work on your sample

    tr -d "\302\247\303\202\302\242\o" <example_file
    Also there is t201208 in the first colume I also remove t away
    First colume has only 0-9 others removed away.
    I don't know why I have wired char and t in flat file extracted database.
    Would you help me
    Thanks

  5. #5
    Trusted Penguin
    Join Date
    May 2011
    Posts
    4,353
    Quote Originally Posted by jimmymj View Post
    Also there is t201208 in the first colume I also remove t away
    First colume has only 0-9 others removed away.
    that sed command i posted should work on a "t", too, as long as it is the first character in the line and is not a 0 thru 9.
    I don't know why I have wired char and t in flat file extracted database.
    How are you generating your database dump?

  6. #6
    Just Joined!
    Join Date
    Oct 2012
    Posts
    24
    thank you!

  7. #7
    Just Joined!
    Join Date
    Oct 2012
    Posts
    24
    Thank you for your answer.

  8. #8
    Linux Newbie mactruck's Avatar
    Join Date
    Apr 2012
    Location
    City of Salt
    Posts
    185
    Did that work? I think awk would have been easier..... never mind I saw the post wrong.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •