Results 1 to 7 of 7
This one is a puzzle...
Need to search a filesystem that contains select files (randomly distributed) where an impacted file got each byte of its data nulled out
Thread: Grep Regular Expression Puzzle (files containing only \0 characters)
. So ...- 12-18-2009 #1Just Joined!
- Join Date
- Dec 2009
- Posts
- 1
Grep Regular Expression Puzzle (files containing only \0 characters)
This one is a puzzle...
Need to search a filesystem that contains select files (randomly distributed) where an impacted file got each byte of its data nulled out \0. So the impacted files still have the same filesize, but the file no longer works (null bytes).
Grep seems a great way to identify which files were impacted. Among the dozen or so variations I've tried, \0+[^\0]* was my best shot. Unfortunately, that found files containing valid data after nulls anywhere in the file.
Could you suggest a grep regular expression that would identify all and only those files where all bytes of its data have been nulled \0?
Thanks and Best Wishes,
Bob
- 12-21-2009 #2Just Joined!
- Join Date
- Feb 2008
- Posts
- 9
-v option
grep has -v option to negate the match.
So, you can probably use like this
grep -v "[^\0]+"
- 12-22-2009 #3Linux Engineer
- Join Date
- Apr 2006
- Location
- Saint Paul, MN, USA / CentOS, Debian, Solaris, SuSE
- Posts
- 1,117
Hi.
My approach would be to look for any occurrence of a match to [^\0], and then ignore that file. Using the -v option of grep will list an appropriate line, but my impression is that a filename is needed, so that the options -l or -L should be used. Also grep usually ignores non-text files, so one might need the -a option as well.
However, I could not get grep to work with those options on a mixture of some and all null-filled files. It may be some design in grep that is not apparent to me, or an option that I have misused.
I was able to get a crude perl script to do what I thought was necessary -- print a file name if the match for [^\0] fails -- i.e. a file that is all nulls. That, preceded by a find piped into xargs seems to handle things except for oddly-named files, such as those with spaces in them. So some tinkering is still needed.
Perhaps this will give the OP an idea or two on alternatives. If / when I get time, I can clean up the perl and bash scripts and post them ... cheers, drlWelcome - get the most out of the forum by reading forum basics and guidelines: click here.
90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
We look forward to helping you with the challenge of the other 10%.
( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )
- 12-22-2009 #4Just Joined!
- Join Date
- Feb 2008
- Posts
- 9
You can combine all these options using grep -val
- 12-22-2009 #5Linux Engineer
- Join Date
- Apr 2006
- Location
- Saint Paul, MN, USA / CentOS, Debian, Solaris, SuSE
- Posts
- 1,117
Welcome - get the most out of the forum by reading forum basics and guidelines: click here.
90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
We look forward to helping you with the challenge of the other 10%.
( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )
- 12-22-2009 #6Just Joined!
- Join Date
- Feb 2008
- Posts
- 9
Hi
Yes, you are right. It does not work.. Hmm. I am interested in knowing why it does not.
- 12-23-2009 #7Linux Engineer
- Join Date
- Apr 2006
- Location
- Saint Paul, MN, USA / CentOS, Debian, Solaris, SuSE
- Posts
- 1,117
Hi.
Here is a longish solution to the problem. It is primarily a perl script to seek a non-null match. If that fails then everything in the file was the null character. It is driven by a shell script that creates a local set of data files with which to test:
calling the perl script:Code:#!/usr/bin/env bash # @(#) s3 Demonstrate search for file with all null bytes. # Requires find to terminate filenames with null to account for # spaces and other odd characters in filenames. # # For local testing, 6 data files are prepared, "data{1,2,4,6 # x}" all have a single null byte, data3 has at least one # non-null byte, and data5 is empty. echo set +o nounset LC_ALL=C LANG=C export LC_ALL LANG echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG" echo "(Versions displayed with local utility \"version\")" version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) find perl set -o nounset START=${1-.} # Create test files, mostly null-filled. if [ "$START" = "." ] then rm -f data? "data6 x" for i in data1 data2 data4 "data6 x" do echo -ne "\0" > "$i" done echo -ne "\0\0\03\0\0\0" > data3 touch data5 echo echo " Data files:" ls -lgG data* echo X_1=data1 echo " Octal dump of example file $X_1:" od -bc $X_1 echo X_2=data3 echo " Octal dump of example file $X_2:" od -bc $X_2 echo echo " Results -- expecting data{1,2,4,\"6 x\"}:" else echo echo " Results:" fi echo echo " Starting search at $START" # Write filenames to a file, then the perl code will read # null-delimited names from t1 (as redirected from t1). echo find $START -type f -print0 | ./perl-files-from-stdin exit 0
producing:Code:#!/usr/bin/perl -0 # @(#) perl-files-from-stdin Demonstrate read file by block, match non-nulls. use warnings; use strict; use feature qw(switch say); use 5.010; my ($debug); $debug = 1; $debug = 0; my ( $buffer, $file, $f, $i, $length ); my ($number_of_files) = 0; my ($all_null_files) = 0; $length = 16384; # exit(0) unless @ARGV; # Read from list of files in STDIN, each expected to be # terminated by null ("\0") byte from find(-print0), # # While somewhat unusual, this avoid problems with oddly-named # files. while ( defined( $file = <> ) ) { $number_of_files++; print " filename from STDIN :$file:\n" if $debug; next if ( stat($file) )[7] == 0; # ignore empty files if ( not open( $f, "<", $file ) ) { print STDERR " Cannot open file $file, ignoring.\n"; next; } $i = 0; my $null = 0; # Read by blocks, try to match non-null in block, stop scanning # if a match is found. while ( read( $f, $buffer, $length ) ) { $i++; # print $buffer; if ( $buffer =~ /[^\0]/ ) { print " Found non-null byte in block: ", $i, ", file $file\n" if $debug; $null = 1; last; } } if ( not $null ) { $all_null_files++; print " All bytes in file $file are null.\n" if $debug; print "$file\n"; } close($f); } print STDERR "( Files read: $number_of_files; all null files: $all_null_files)\n"; exit(0);
and for a larger test case, this excerpt from my home directory:Code:% ./s3 Environment: LC_ALL = C, LANG = C (Versions displayed with local utility "version") OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64 Distribution : Debian GNU/Linux 5.0 GNU bash 3.2.39 find (GNU findutils) 4.4.0 perl 5.10.0 Data files: -rw-r--r-- 1 1 Dec 23 12:38 data1 -rw-r--r-- 1 1 Dec 23 12:38 data2 -rw-r--r-- 1 6 Dec 23 12:38 data3 -rw-r--r-- 1 1 Dec 23 12:38 data4 -rw-r--r-- 1 0 Dec 23 12:38 data5 -rw-r--r-- 1 1 Dec 23 12:38 data6 x Octal dump of example file data1: 0000000 000 \0 0000001 Octal dump of example file data3: 0000000 000 000 003 000 000 000 \0 \0 003 \0 \0 \0 0000006 Results -- expecting data{1,2,4,"6 x"}: Starting search at . ./data4 ./data2 ./data6 x ./data1 ( Files read: 18; all null files: 4)
Best wishes ... cheers, drlCode:% ./s3 ~ ... ( Files read: 24720; all null files: 6)
Welcome - get the most out of the forum by reading forum basics and guidelines: click here.
90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
We look forward to helping you with the challenge of the other 10%.
( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )


Reply With Quote
