Find the answer to your Linux question:
Results 1 to 7 of 7
Hi, My question is not as simple as the title is. An example will help to understand what I want to do. I have a file that contains the following ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    xuo
    xuo is offline
    Just Joined!
    Join Date
    Oct 2006
    Posts
    35

    sed : how to remove several lines


    Hi,

    My question is not as simple as the title is.
    An example will help to understand what I want to do.

    I have a file that contains the following lines :

    (CELL
    (CELLTYPE "cell_to_keep")
    (INSTANCE yyy)
    )
    (CELL
    (CELLTYPE "cell_to_remove")
    (INSTANCE xxx)
    )
    (CELL
    (CELLTYPE "cell_to_keep")
    (INSTANCE yyy)
    )
    ...

    I want to parse the file and remove in this example the lines 5 up to 8.
    I can have several 4-lines fields, either the one I want to keep or the one I want to remove.

    I tried to search for (CELL then if the next line contains "cell_to_remove", then I remove from the line that contains (CELL up to the first line that starts with ')'.
    But I can't make it work.

    Does anybody have an idea how to do it. Using sed is not mandatory, but I really would prefer to use it.

    Regards.

    Xuo.

  2. #2
    Trusted Penguin
    Join Date
    May 2011
    Posts
    4,353
    yeah, that could be done in sed, but i'd prefer to do it in perl. so i have - give it a go.

    i gave it two ways in which to delete a given cell's lines:

    1. user input: the user will be shown the group of lines pertaining to a given cell block, and asked (y/n) whether the lines should be deleted

    2. string match: you define a string at the top of the script, and if any lines in a given cell block match, that entire cell block is kept

    To tell the script which method you prefer, set the variable $method at the top of the script to be either 'ask_user' or 'match'. Note: you have to comment out one or the other as you can't have the same global variable declare with my twice.

    If you choose string match, then be sure to set the variable $regex to the string you desire. To work with your example cell text, i used "cell_to_keep" as the default.

    the script writes to a temporary file, then when it is done with all cell entries, it will copy the temp file over to the original one - so make sure you have backups of the file, if you expect the worst (and you should...).

    Note: pass the file containing the cell text to the script as a command line arg. for example, if you call the perl script "read-cells.pl" and your text file is "cell_data.txt", then do:

    Code:
    ~/read-cells.pl cell_data.txt
    after you make the script executable, of course:
    Code:
    chmod +x ~/read-cells.pl
    here's the code:

    Code:
    #!/usr/bin/perl
    use strict;
    use warnings;
    use File::Copy;
    
    #my $method = 'ask_user';
    my $method = 'match';
    
    my $regex = 'cell_to_keep';
    
    my %hash;
    
    # will be set to the number of cell entries deleted
    my $delete = 0;
    
    # get the file containg your CELL text from command line arg
    my $file = shift || die "usage: $0 </path/to/cell.txt>\n";
    
    # temporary copy of file containing changes
    my $tmpfile = $file.'.tmp';
    die "Temporary file \`$tmpfile' exists\n" if(-f$tmpfile);
    
    my $i;
    
    # read in the file
    open(FH,'<',$file) or die "can't open '$file': $!\n";
    while(<FH>){
      chomp;
      my $line = $_;
    
      # strip trailing white space
      $line =~ s/[ \t]+.*$//;
    
      if($line =~ /^\(CELL$/){
        $i = scalar keys %hash;
        push(@{$hash{$i}},$_);
        $i+=1;
      }else{
        push(@{$hash{$i - 1}},$_);
      }
    }
    close(FH);
    
    for my $cell(sort {$a<=>$b} keys %hash){
      print "\ncell $cell:\n";
      my @lines = @{$hash{$cell}};
      print "  $_\n" for (@lines);
    
      # if set, this will keep this cell's lines
      my $keep;
    
      # how we determine to keep depends upon the method
      if($method eq 'ask_user'){
        $keep = &user_choice($cell);
      }elsif($method eq 'match'){
        $keep = 1 if(grep {/$regex/} @lines);
      }
    
      if($keep){
        open(TMP,'>>',$tmpfile) or die "can't write to '$file': $!\n";
        print TMP $_,"\n" for(@lines);
        close(TMP);
      }else{
        $delete+=1;
      }
    
    }
    
    # copy temp file as original file, as some cell lines have been deleted
    if($delete){
      print "Removed $delete cell(s), copying new file into place\n";
      move($tmpfile,$file);
    }
    
    sub user_choice {
      my($cell) = @_;
    
      # prompt user whether or not to keep cell entry
      my $answer;
      until($answer){
        print "Do you wish to keep cell $cell? [y|n] ";
        $answer = <STDIN>;
        chomp($answer) if($answer);
        if($answer){
          unless($answer eq 'y' or $answer eq 'n'){
            print "Invalid response \`$answer'\n";
            undef($answer);
          }
        }
      }
      return(($answer eq 'y') ? 1 : 0);
    }
    Note2: if you opt for the regex method, you can just leave off that whole user_choice subroutine at the end. There are probably better ways than the string regex method i gave, btw, but this is a start anyway.

  3. #3
    xuo
    xuo is offline
    Just Joined!
    Join Date
    Oct 2006
    Posts
    35
    Hi,

    Thank you for the script but I think I'd be able to do it with Perl.
    But as I have started to write a tcsh script, then I would be obliged to write a Perl script and call it into the tcsh one. This is possible (I've already do that several times), but I am sure a sed guru would do it in 3 or 4 sed lines. And my sed guru is on vacation ...

    Regards.

    Xuo.

  4. #4
    xuo
    xuo is offline
    Just Joined!
    Join Date
    Oct 2006
    Posts
    35
    Hi,

    I make it work. I didn't remove the lines but i've commented them out.
    I did the following :

    sed --in-place -e '/ (CELL/ {\ # I search for the (CELL keyword
    N\ # I add the next line in the working buffer
    /(CELLTYPE "cell_to_remove"/ {\ # If this line contains cell_to_remove
    N\ # I add the next line in the working buffer
    /^ )/\!{\ # If the next line does not start with ' )'
    N\ # I add the next line in the working buffer
    }\
    s/^/\/\//g\ # Then I replace beginning of line (the first line of the working buffer) by //
    s/\n/\n\/\//g\ # I replace <new line> with <new line> followed by // in the working buffer
    }\
    }' myFile

    Regards.

    Xuo.

  5. #5
    Linux Engineer Kloschüssel's Avatar
    Join Date
    Oct 2005
    Location
    Italy
    Posts
    773
    Another neat idea:

    You could transform the input file into a xml file by replacing all occurrences of "(" with "<section>" and all occurrences of ")" with "</section>". Then apply a xquery/xls transformation that strips off whatever you want to strip off and as last step you restore "(" and ")" by replacing "<section>" and "</section>".

    Cheers

  6. #6
    xuo
    xuo is offline
    Just Joined!
    Join Date
    Oct 2006
    Posts
    35
    Hi,

    This seems more difficult for me.
    My solution is not a very good one as it does not work if my file looks like :
    (CELL
    (CELLTYPE "cell_to_keep")
    (INSTANCE yyy)
    (something else
    (again something else
    )
    )
    )

    I still need to improve my sed solution.

    Thank you to all for your help.

    Xuo.

  7. #7
    Linux Engineer Kloschüssel's Avatar
    Join Date
    Oct 2005
    Location
    Italy
    Posts
    773
    To get once more into my idea, I saved your sample to "input":

    Code:
    ~$ cat > input
    (CELL
    (CELLTYPE "cell_to_keep")
    (INSTANCE yyy)
    (something else
    (again something else
    )
    )
    )
    ^D
    Then I replace ^( with <section> and )$ with </section>:
    Code:
    ~$ cat input | sed 's/^(/<section>/g' | sed 's/)$/<\/section>/g'
    <section>CELL
    <section>CELLTYPE "cell_to_keep"</section>
    <section>INSTANCE yyy</section>
    <section>something else
    <section>again something else
    </section>
    </section>
    </section>
    Then I use xmllint from the package libxml2-utils to do a fast xpath:
    Code:
    ~$ cat input | sed 's/^(/<section>/g' | sed 's/)$/<\/section>/g' | xmllint --xpath "/section/section[1]" -
    <section>CELLTYPE "cell_to_keep"</section>
    and I leave the rest to you.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •