Find the answer to your Linux question:
Results 1 to 7 of 7
This one is a puzzle... Need to search a filesystem that contains select files (randomly distributed) where an impacted file got each byte of its data nulled out
  1. #1
    Just Joined!
    Join Date
    Dec 2009
    Posts
    1

    Grep Regular Expression Puzzle (files containing only \0 characters)

    This one is a puzzle...

    Need to search a filesystem that contains select files (randomly distributed) where an impacted file got each byte of its data nulled out \0. So the impacted files still have the same filesize, but the file no longer works (null bytes).

    Grep seems a great way to identify which files were impacted. Among the dozen or so variations I've tried, \0+[^\0]* was my best shot. Unfortunately, that found files containing valid data after nulls anywhere in the file.

    Could you suggest a grep regular expression that would identify all and only those files where all bytes of its data have been nulled \0?

    Thanks and Best Wishes,
    Bob

  2. #2
    Just Joined!
    Join Date
    Feb 2008
    Posts
    9

    -v option

    grep has -v option to negate the match.
    So, you can probably use like this
    grep -v "[^\0]+"

  3. #3
    drl
    drl is online now
    Linux Engineer drl's Avatar
    Join Date
    Apr 2006
    Location
    Saint Paul, MN, USA / CentOS, Debian, Solaris, SuSE
    Posts
    1,117
    Hi.

    My approach would be to look for any occurrence of a match to [^\0], and then ignore that file. Using the -v option of grep will list an appropriate line, but my impression is that a filename is needed, so that the options -l or -L should be used. Also grep usually ignores non-text files, so one might need the -a option as well.

    However, I could not get grep to work with those options on a mixture of some and all null-filled files. It may be some design in grep that is not apparent to me, or an option that I have misused.

    I was able to get a crude perl script to do what I thought was necessary -- print a file name if the match for [^\0] fails -- i.e. a file that is all nulls. That, preceded by a find piped into xargs seems to handle things except for oddly-named files, such as those with spaces in them. So some tinkering is still needed.

    Perhaps this will give the OP an idea or two on alternatives. If / when I get time, I can clean up the perl and bash scripts and post them ... cheers, drl
    Welcome - get the most out of the forum by reading forum basics and guidelines: click here.
    90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
    We look forward to helping you with the challenge of the other 10%.
    ( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )

  4. #4
    Just Joined!
    Join Date
    Feb 2008
    Posts
    9
    You can combine all these options using grep -val

  5. #5
    drl
    drl is online now
    Linux Engineer drl's Avatar
    Join Date
    Apr 2006
    Location
    Saint Paul, MN, USA / CentOS, Debian, Solaris, SuSE
    Posts
    1,117
    Hi, rohshall.
    Quote Originally Posted by rohshall View Post
    You can combine all these options using grep -val
    Thank you for your comment.

    Do you have a working example of a solution to the problem using grep with these options ?

    If so, please post it with sample input and output ... cheers, drl
    Welcome - get the most out of the forum by reading forum basics and guidelines: click here.
    90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
    We look forward to helping you with the challenge of the other 10%.
    ( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )

  6. #6
    Just Joined!
    Join Date
    Feb 2008
    Posts
    9

    Hi

    Yes, you are right. It does not work.. Hmm. I am interested in knowing why it does not.

  7. #7
    drl
    drl is online now
    Linux Engineer drl's Avatar
    Join Date
    Apr 2006
    Location
    Saint Paul, MN, USA / CentOS, Debian, Solaris, SuSE
    Posts
    1,117
    Hi.

    Here is a longish solution to the problem. It is primarily a perl script to seek a non-null match. If that fails then everything in the file was the null character. It is driven by a shell script that creates a local set of data files with which to test:
    Code:
    #!/usr/bin/env bash
    
    # @(#) s3	Demonstrate search for file with all null bytes.
    
    # Requires find to terminate filenames with null to account for
    # spaces and other odd characters in filenames.
    #
    # For local testing, 6 data files are prepared, "data{1,2,4,6
    # x}" all have a single null byte, data3 has at least one
    # non-null byte, and data5 is empty.
    
    echo
    set +o nounset
    LC_ALL=C 
    LANG=C 
    export LC_ALL LANG
    echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
    echo "(Versions displayed with local utility \"version\")"
    version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) find perl
    set -o nounset
    
    START=${1-.}
    
    # Create test files, mostly null-filled.
    
    if [ "$START" = "." ]
    then
      rm -f data? "data6 x"
      for i in data1 data2 data4 "data6 x"
      do
        echo -ne "\0" > "$i"
      done
      echo -ne "\0\0\03\0\0\0" > data3
      touch data5
    
      echo
      echo " Data files:"
      ls -lgG data*
    
      echo
      X_1=data1
      echo " Octal dump of example file $X_1:"
      od -bc $X_1
      echo
      X_2=data3
      echo " Octal dump of example file $X_2:"
      od -bc $X_2
    
      echo
      echo " Results -- expecting data{1,2,4,\"6 x\"}:"
    else
      echo 
      echo " Results:"
    fi
    
    echo
    echo " Starting search at $START"
    
    # Write filenames to a file, then the perl code will read
    # null-delimited names from t1 (as redirected from t1).
    
    echo
    find $START -type f -print0 |
    ./perl-files-from-stdin 
    
    exit 0
    calling the perl script:
    Code:
    #!/usr/bin/perl -0
    
    # @(#) perl-files-from-stdin	Demonstrate read file by block, match non-nulls.
    
    use warnings;
    use strict;
    use feature qw(switch say);
    use 5.010;
    
    my ($debug);
    $debug = 1;
    $debug = 0;
    
    my ( $buffer, $file, $f, $i, $length );
    my ($number_of_files) = 0;
    my ($all_null_files)  = 0;
    $length = 16384;
    
    # exit(0) unless @ARGV;
    
    # Read from list of files in STDIN, each expected to be
    # terminated by null ("\0") byte from find(-print0),
    #
    # While somewhat unusual, this avoid problems with oddly-named
    # files.
    
    while ( defined( $file = <> ) ) {
      $number_of_files++;
      print " filename from STDIN :$file:\n" if $debug;
      next if ( stat($file) )[7] == 0;    # ignore empty files
      if ( not open( $f, "<", $file ) ) {
        print STDERR " Cannot open file $file, ignoring.\n";
        next;
      }
      $i = 0;
      my $null = 0;
    
      # Read by blocks, try to match non-null in block, stop scanning
      # if a match is found.
      while ( read( $f, $buffer, $length ) ) {
        $i++;
    
        # print $buffer;
        if ( $buffer =~ /[^\0]/ ) {
          print " Found non-null byte in block: ", $i, ", file $file\n"
            if $debug;
          $null = 1;
          last;
        }
      }
      if ( not $null ) {
        $all_null_files++;
        print " All bytes in file $file are null.\n" if $debug;
        print "$file\n";
      }
      close($f);
    }
    
    print STDERR
      "( Files read: $number_of_files; all null files: $all_null_files)\n";
    exit(0);
    producing:
    Code:
    % ./s3
    
    Environment: LC_ALL = C, LANG = C
    (Versions displayed with local utility "version")
    OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
    Distribution        : Debian GNU/Linux 5.0 
    GNU bash 3.2.39
    find (GNU findutils) 4.4.0
    perl 5.10.0
    
     Data files:
    -rw-r--r-- 1 1 Dec 23 12:38 data1
    -rw-r--r-- 1 1 Dec 23 12:38 data2
    -rw-r--r-- 1 6 Dec 23 12:38 data3
    -rw-r--r-- 1 1 Dec 23 12:38 data4
    -rw-r--r-- 1 0 Dec 23 12:38 data5
    -rw-r--r-- 1 1 Dec 23 12:38 data6 x
    
     Octal dump of example file data1:
    0000000 000
             \0
    0000001
    
     Octal dump of example file data3:
    0000000 000 000 003 000 000 000
             \0  \0 003  \0  \0  \0
    0000006
    
     Results -- expecting data{1,2,4,"6 x"}:
    
     Starting search at .
    
    ./data4
    ./data2
    ./data6 x
    ./data1
    ( Files read: 18; all null files: 4)
    and for a larger test case, this excerpt from my home directory:
    Code:
    % ./s3 ~
    
    ...
    
    ( Files read: 24720; all null files: 6)
    Best wishes ... cheers, drl
    Welcome - get the most out of the forum by reading forum basics and guidelines: click here.
    90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
    We look forward to helping you with the challenge of the other 10%.
    ( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...