Results 1 to 2 of 2
Hello,
I've been searching the internet a very long time for a program (open source) that can help me locate files on my harddrive that are duplicates. I am running ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
- 09-16-2012 #1Just Joined!
- Join Date
- Sep 2012
- Posts
- 1
Program to find the same file Multiple times
Hello,
I've been searching the internet a very long time for a program (open source) that can help me locate files on my harddrive that are duplicates. I am running debian if this helps with any ideas.
Thank you for any assistance!
AbsoluteZ3r0
- 09-16-2012 #2Trusted Penguin
- Join Date
- May 2011
- Posts
- 3,746
i seem to recall reading about such a program somewhere recently, but now i can't remember where. that makes for a good opportunity to write some code to do it, though!
caveat: this will take a while to run and is terribly inefficient.
this is a perl script that will use MD5 checksums to determine the uniqueness of all files found on your system. copy it to a file named "find-dupes.pl". it takes a single argument to it: the name of the directory to search. Use "/" to search your entire system. Test it out by running it on /tmp for instance, e.g.:
Here's the code:Code:find-dupes.pl /tmp
Like I said earlier, this will take a while to run. You will probably want to pipe it to tee and/or redirect the output to a log. you may also want to run it in a screen b/c it may take so long.Code:#!/usr/bin/perl use strict; use warnings; $| = 1; # flush buffer # the directory to search (use "/" to search your entire system) my $dir = shift || die "Usage: $0 <DIRECTORY>\n"; die "$dir: No such directory\n" unless(-d$dir); # hash to hold lists of files by checksum my %hash; # file counter my $cnt = 0; # find all non-empty files and get their MD5 checksums print 'Finding all files...'; open(PH,'find '.$dir.' -type f ! -size 0 -exec md5sum {} \;|') or die "can't run 'find': $!\n"; while(<PH>){ chomp; s/[ \t]+/ /g; # shrink white spaces to single space my($cksum,$filename) = split; # split on checksum and filename # print "FILE $filename == $cksum\n"; push(@{$hash{$cksum}},$filename); # save to list (array) in the hash $cnt += 1; } close(PH); print "done.\n"; # tally up the totals print "Found [",$cnt,"] files\n"; print "Found [",scalar keys %hash,"] unique file checksums\n"; # see if no duplicates were found if($cnt == scalar keys %hash){ print "Not a single duplicate file was found...er, this is unlikely!\n"; exit(0); } # display duplicates for my $key(keys %hash){ my @files = @{$hash{$key}}; # skip checksums that only have one unique file associated with them next if($#files<1); # display the duplicate files print "\nChecksum $key has ",$#files + 1," files:\n"; print "\t",$_,"\n" for(@files); }


Reply With Quote
