Find the answer to your Linux question:
Results 1 to 9 of 9
hi, i want to filter out lines that contain more than 49% of specific character. for example: line 1 : QWERWWWRWT line 2 : QWERTYUIPPP After: line 2 only, because ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Jul 2011
    Posts
    18

    filtering out some strings


    hi,
    i want to filter out lines that contain more than 49% of specific character.
    for example:
    line 1 : QWERWWWRWT
    line 2 : QWERTYUIPPP

    After:
    line 2 only, because in line 1 i had 5/10 times "W".

    is there an easy way by bash? sed or awk?

    Thanks!
    Pap

  2. #2
    Linux Engineer Kloschüssel's Avatar
    Join Date
    Oct 2005
    Location
    Italy
    Posts
    773
    awk can definitely do the job. However, I have neither the time nore the knowledge to give you ready-to-rumble code snippets. Work your ways through awk tutorials on how to count characters with it. Mostly you'll have to count all "W" characters in one variable and the rest in another variable and compare the outcome on a line feed. A good start will be this site.

  3. #3
    Linux Newbie
    Join Date
    Nov 2012
    Posts
    224
    hi,

    it's probably easier the other way:
    • keep line in a variable
    • substitute "W" with nothing in kept line
    • get length of line and of modified kept line
    • do line's length minus modified line's length and do the percentage
    • if it's less than 49, then print line

    use awk, because bash can't do float arithmetics.

  4. #4
    Linux Engineer Kloschüssel's Avatar
    Join Date
    Oct 2005
    Location
    Italy
    Posts
    773
    gsub should be able to replace every W (gsub("[W]", "")) and every not-W (gsub("[^W]", ""). Comparing the length of both resulting strings should then be enough - no fancy arithmatics needed.

    More or less something alike this:

    Code:
            line_onlychar=gsub("[W]", "");
            line_nochar=gsub("[^W]", "");
            if(length(line_nochar) > length(line_onlychar)) {
                    # matched
            }

  5. #5
    Linux Newbie
    Join Date
    Nov 2012
    Posts
    224
    yes, gsub() returns the number of substitutions
    Code:
    awk '{
       keep=$0
       Ws=gsub("W","",keep)
       if(length(keep)<Ws)print
    }'
    «49» makes me think that it may be any "ratio", not only 50-50.

  6. #6
    Just Joined!
    Join Date
    Jul 2011
    Posts
    18
    Quote Originally Posted by watael View Post
    yes, gsub() returns the number of substitutions
    Code:
    awk '{
       keep=$0
       Ws=gsub("W","",keep)
       if(length(keep)<Ws)print
    }'
    «49» makes me think that it may be any "ratio", not only 50-50.
    Thanks watael!
    your assuming was right also!

  7. #7
    Just Joined!
    Join Date
    Jul 2011
    Posts
    18
    Quote Originally Posted by watael View Post
    yes, gsub() returns the number of substitutions
    Code:
    awk '{
       keep=$0
       Ws=gsub("W","",keep)
       if(length(keep)<Ws)print
    }'
    «49» makes me think that it may be any "ratio", not only 50-50.
    what if i want the ratio to be 60-40.
    just if there are more than 60% of not W characters keep the line.

    Many thanks!

  8. #8
    Linux Newbie
    Join Date
    Nov 2012
    Posts
    224
    replace the if statement with this one
    Code:
    if( (length(keep)/length($0))*100 >= 60 )print

  9. #9
    Just Joined!
    Join Date
    Jul 2011
    Posts
    18
    Quote Originally Posted by watael View Post
    replace the if statement with this one
    Code:
    if( (length(keep)/length($0))*100 >= 60 )print
    Great!
    Thanks for quick respond!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •