Find the answer to your Linux question:
Page 1 of 2 1 2 LastLast
Results 1 to 10 of 16
script for automatic table filtering Hello everyone! i I'm trying to write a script that can filter data in a text file. the source file looks like this upload 500 ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    o28
    o28 is offline
    Just Joined!
    Join Date
    Jul 2012
    Posts
    9

    file filtering with script


    script for automatic table filtering

    Hello everyone! i I'm trying to write a script that can filter data in a text file.

    the source file looks like this

    upload

    500 7021481333 _219402757693 12.07.2012 8:49:06
    300,20 7014977234 _219402760235 12.07.2012 8:55:47
    200 7756808289 _219402762183 12.07.2012 9:00:51
    250 7756520164 _219402762736 12.07.2012 9:02:13

    remove 13,07,12 -1 remove 13,07,12

    219502312010_ 1300 77024486668 2 500 77013018911 _219582735923 13.07.2012 1:39:06
    219502346959_ 400 77025379729 2 600 77028475253 _219582738600 13.07.2012 10:38:00
    219502496629_ 300 77013786984 2 760 77024417080 _219588544699 13.07.2012 16:55:16
    219502549957_ 440,40 77754584939 2
    219502648140_ 500 77024668698 2 14,07,12
    219502750647_ 350 77025930602 2 100 77029022189 _219682757249 14.07.2012 11:37:39
    219502845399_ 600 77014767138 2
    219502895273_ 340 77016857922 2
    219502312708_ 300 77021293737 2 15,07,12
    219502467265_ 400 77018327544 2 500 77014679513 _219788548713 15.07.2012 1:09:01
    219502695601_ 500 77016116993 2 250 77026221577 _219788548716 15.07.2012 1:17:13


    77014334033 170 2
    77013645949 470 2
    77018123706 970 2
    77022574546 470 2
    77012541665 470 2
    77013255542 370 2
    77025557979 570 2
    77021218700 370 2
    77753808879 200 2
    the result file should look like this



    500 7021481333
    300.20 7014977234
    200 7756808289
    250 7756520164



    1300 77024486668
    400 77025379729
    300 77013786984
    440.40 77754584939
    500 77024668698
    350 77025930602
    600 77014767138
    340 77016857922
    300 77021293737
    400 77018327544
    500 77016116993




    500 77013018915
    600 77028475253
    760 77024417080
    100 77029022189
    500 77014679513
    250 77026221577


    170 77014334033
    470 77013645949
    970 77018123706
    470 77022574546
    470 77012541665
    370 77013255542
    570 77025557979
    370 77021218700
    200 77753808879
    As you can see mostly i need to delete fields like _219402757693 and date - time 12.07.2012 8:49:06

    i know that it's possible to filter text with awk or cut. but the problem is that:

    mobile number and amount are in different columns in each block of tables which are much bigger that this sample.

    number and amount in some tables are amount and number

    amount can be a number from 3 (ex 300) to 7 symbols (ex 3000.00)
    some numbers 7XXXXXXXXXX (11 digit format) some are 7XXXXXXXXX (10 digit format) but result should be 11 digit format
    commas need to be replaced with dots. (found how to do it with sed)

    I was googling for about a week and reading about awk, sed and other text processing/editing commands, but didn't find solution which will work



    My main problem is to filter out fields like _219402757693 and date - time 12.07.2012 8:49:06


    HELP ME PLEEEEEEEEEEEEAAAASE!!!!!!

    Thank you!!!!
    Edit/Delete Message

  2. #2
    Linux Newbie mactruck's Avatar
    Join Date
    Apr 2012
    Location
    City of Salt
    Posts
    187
    is this the whole file or just a sample of it?

  3. #3
    Trusted Penguin
    Join Date
    May 2011
    Posts
    4,353
    try this hastily cobbled together script. i'm sure it will barf on 10 digits numbers, i didn't really test for that. but it is a start. if you have an example of more data that has different format examples, post that.

    the code expects your data to be in a file in the current directory, defined in the variable file at the top of the script.

    Code:
    #!/bin/bash
    file=./cell-data.txt
    while read line; do
      col1=$(echo $line|awk '/^[0-9]/{print $1}')
      echo $col1|grep -q _$
      if [ $? -eq 0 ]; then
        ac=$(echo $line|awk '/^[0-9]/{print $2}')
        cn=$(echo $line|awk '/^[0-9]/{print $3}')
      else
        cnt=$(printf "$col1"|wc -c)
    
        if [ $cnt -eq 0 ]; then
          continue
        elif [ $cnt -eq 11 ]; then
          cn=$col1
          ac=$(echo $line|awk '/^[0-9]/{print $2}')
        else
          ac=$col1
          cn=$(echo $line|awk '/^[0-9]/{print $2}')
        fi
      fi
      ac=$(echo $ac|sed -e 's|,|.|')
      echo $ac $cn
    done < <(cat $file)
    Last edited by atreyu; 07-25-2012 at 03:01 AM. Reason: typo

  4. #4
    o28
    o28 is offline
    Just Joined!
    Join Date
    Jul 2012
    Posts
    9
    Quote Originally Posted by mactruck View Post
    is this the whole file or just a sample of it?
    Hello mactruck, it's just a sample.

  5. #5
    o28
    o28 is offline
    Just Joined!
    Join Date
    Jul 2012
    Posts
    9
    Hi atreyu! thank you for reply! your script works fine and i don't think that i could ever write something like this by myself THANKS A LOT LOT LOT LOT LOT)))

    Mostly data has the same formats as in first sample. here is one more:

    000001002484383_ 470 77029699990 2
    000001002484462_ 470 77013224884 2
    000001002485378_ 190 77755789383 9
    000001002485797_ 167 77026676863 2
    000001002486441_ 170 77751548249 2
    000001002486462_ 470 77027496666 2
    000001002486748_ 470 77011211701 2


    1000 77023051300 220503246496 23/07/2012 13:53:26
    1000 77025391119 220503248811 23/07/2012 14:23:26
    500 77018523333 220503261346 23/07/2012 15:53:26
    200 77753138002 220503263527 23/07/2012 16:08:26
    500 77013452050 220503273661 23/07/2012 17:38:26
    500 77015945002 220503279886 23/07/2012 18:59:52
    350 77014576686 220503280408 23/07/2012 18:59:56
    500 77021647582 46719155400 23/07/2012 16:54:25


    7029108515 370 23.07.2012 12:34:32 10596010869
    7021478521 350 23.07.2012 17:23:48 10596970751
    7023337883 6400 23.07.2012 18:30:34 10597676582


    400 77011298723 23.07.2012 12:00:04 469230
    220 77753702350 23.07.2012 12:44:54 469402

    delete
    1000 77029740828 _220582855288 23.07.2012 10:08:27
    350 77025222923 _220582860807 23.07.2012 12:33:19
    1000 77014599835 _220582870516 23.07.2012 18:50:11
    300 77013748455 _220588578735 23.07.2012 10:10:06
    160 77018185383 _220588580547 23.07.2012 12:16:04
    400 77018460916 _220588581543 23.07.2012 15:22:00
    500 77014144488 _220588581697 23.07.2012 15:47:00
    2000 77024096668 _220588581835 23.07.2012 16:14:04


    delete
    repeat -1
    220502802938_ 500 77028440066 2
    220502842104_ 400 77013727131 2
    220502872072_ 500 77029290612 2
    220502946292_ 400 77026393508 2
    220502003600_ 700 77016483310 2
    220502021598_ 400 77013886066 2
    the output which i get with your script,

    470 77029699990
    470 77013224884
    190 77755789383
    167 77026676863
    170 77751548249
    470 77027496666
    470 77011211701
    1000 77023051300
    1000 77025391119
    500 77018523333
    200 77753138002
    500 77013452050
    500 77015945002
    350 77014576686
    500 77021647582
    7029108515 370
    7021478521 350
    7023337883 6400

    400 77011298723
    220 77753702350
    1000 77029740828
    350 77025222923
    1000 77014599835
    300 77013748455
    160 77018185383
    400 77018460916
    500 77014144488
    2000 77024096668
    500 77028440066
    400 77013727131
    500 77029290612
    400 77026393508
    700 77016483310
    400 77013886066
    the bolded part should be turned as the rest [amount / number] and number should become 7029108515 --> 77029108515 (possible prefixes are 7 (that need to be added (country code)) then 701 or 702 or 775 or 778 and rest of the number XXXXXXX)

    is it possible to do on a number-of-digits based pattern??? because i tried sed 's/701/7701/g' result_file | sed 's/702/7702/g' | sed 's/775/7775/g' | sed 's/778/7778/g'

    and it modifies all numbers so the result is incorrect

    77029108515 370
    77021478521 350
    77023337883 6400
    400 777011298723
    220 7777537702350
    1000 777029740828
    Thank you one more time

  6. #6
    Linux Newbie
    Join Date
    Jun 2012
    Location
    SF Bay area
    Posts
    173
    I'm not sure I understand all the filtering and reformatting you want to do from the posted comments, but here's an AWK script that might be helpful. I think it does the sorts of thing you're talking about at least.

    Code:
    #!/usr/local/bin/gawk -f
    
    BEGIN \
    {
        PREF_MAP[ "701"] = 7;
        PREF_MAP[ "702"] = 7;
        PREF_MAP[ "703"] = 7;
        PREF_MAP[ "704"] = 7;
        PREF_MAP[ "705"] = 7;
        PREF_MAP[ "603"] = 1;
        PREF_MAP[ "502"] = 2;
    
        DEFAULT_PREF = 7;
    }
    
    {
        cell = "";
        amount = "";
    
        head = $1;
        check = head;
        gsub( "[0-9,]", "", check);
    
        if( substr( head, length( head)) == "_")
        {
            amount = $2;
            cell = $3;
        }
        else if( length( head) >= 10)
        {
            cell = $1;
            amount = $2;
        }
        else if( check == "")
        {
            amount = $1;
            cell = $2
        }
    
        if( cell != "" && amount != "")
        {
            gsub( ",", ".", amount);
    
            if( length( cell) < 11)
            {
                pref = substr( cell, 1, 3);
                lead = PREF_MAP[ pref];
                if( lead == "") lead = DEFAULT_PREF;
                cell = lead cell;
            }
    
            printf( "%s %s\n", amount, cell);
        }
    }
    If you don't have gawk or it's someplace else, just change the first line to "#!/usr/bin/awk -f" or whatever makes sense. Then run it by passing the name of your data file as the first argument to the script. For instance if you called the file "cell-filter" and make sure it's executable with "chmod 755 cell-filter", then you would invoke it with "cell-filter name-of-data-file". You can add as many entries to the "PREF_MAP" array as you like to map 3 digit prefixes for cell numbers that aren't 11 digits. I made up the "602" and "503" entries for testing, so delete them.

    The code ignores lines it doesn't recognize as having the fields you want, so there are no blank lines in the output. I don't know if that's a good or a bad thing?

  7. #7
    o28
    o28 is offline
    Just Joined!
    Join Date
    Jul 2012
    Posts
    9
    Thanks cnamejj!

    There is no gawk installed on that system there is /usr/bin/awk.

    i've tried your script. but i get

    awk: syntax error near line 22
    awk: illegal statement near line 22
    awk: syntax error near line 42
    awk: illegal statement near line 42

  8. #8
    Linux Newbie
    Join Date
    Jun 2012
    Location
    SF Bay area
    Posts
    173
    Quote Originally Posted by o28 View Post
    Thanks cnamejj!

    There is no gawk installed on that system there is /usr/bin/awk.

    i've tried your script. but i get

    awk: syntax error near line 22
    awk: illegal statement near line 22
    awk: syntax error near line 42
    awk: illegal statement near line 42
    Strange! I edited mine to have "#!/usr/bin/awk -f" and re-ran it without any problems on Fedora system. So I tried on Ubuntu, it worked. And on MacOS and that worked too. The two lines it's complaining about, 22 and 42, are the "gsub" calls which should be fine on any Linux awk. Is the system you're running it on some really old OS?

  9. #9
    o28
    o28 is offline
    Just Joined!
    Join Date
    Jul 2012
    Posts
    9
    it runs Oracle Solaris 10 8/11 s10s_u10wos_17b



    i tried it under red hat. everything works. the output file

    200 77753138002
    500 77013452050
    500 77015945002
    350 77014576686
    500 77021647582
    370 77029108515
    350 77021478521
    6400 77023337883
    400 77011298723
    220 77753702350
    1000 77029740828
    350 77025222923
    hmmm.... in solaris there should be nawk

    Code:
    host@~/> which nawk
    /usr/bin/nawk
    so i replaced 1st line with #!/usr/bin/nawk

    but got an

    Code:
    host@~/> ./try test
    -bash: ./try: usr/bin/nawk: bad interpreter: No such file or directory
    but can run file like this

    Code:
    host@~/> nawk -f try test
    and it worked.

    thank you guys!!!!

  10. #10
    Trusted Penguin
    Join Date
    May 2011
    Posts
    4,353
    Quote Originally Posted by o28 View Post
    Mostly data has the same formats as in first sample. here is one more:

    the output which i get with your script,

    the bolded part should be turned as the rest [amount / number] and number should become 7029108515 --> 77029108515 (possible prefixes are 7 (that need to be added (country code)) then 701 or 702 or 775 or 778 and rest of the number XXXXXXX)

    is it possible to do on a number-of-digits based pattern??? because i tried sed 's/701/7701/g' result_file | sed 's/702/7702/g' | sed 's/775/7775/g' | sed 's/778/7778/g'

    and it modifies all numbers so the result is incorrect
    that should be easy to fix. replace this portion of the code:
    Code:
        if [ $cnt -eq 0 ]; then
          continue
        elif [ $cnt -eq 11 ]; then
          cn=$col1
          ac=$(echo $line|awk '/^[0-9]/{print $2}')
        else
          ac=$col1
          cn=$(echo $line|awk '/^[0-9]/{print $2}')
        fi
    with this:
    Code:
        if [ $cnt -eq 0 ]; then
          continue
        elif [ $cnt -eq 11 ]; then
          cn=$col1
          ac=$(echo $line|awk '/^[0-9]/{print $2}')
        elif [ $cnt -eq 10 ]; then
          cn="7${col1}"
          ac=$(echo $line|awk '/^[0-9]/{print $2}')
        else
          ac=$col1
          cn=$(echo $line|awk '/^[0-9]/{print $2}')
        fi

Page 1 of 2 1 2 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •