Results 1 to 10 of 10
I'm getting ready to do some penetration testing of my system, and I have a wordlist I'd like to use during part of that process. There is a problem though: ...
- 03-29-2009 #1
Need someone to check Perl script, please.
I'm getting ready to do some penetration testing of my system, and I have a wordlist I'd like to use during part of that process. There is a problem though: it has duplicates. So, I tried to write a Perl script to remove all duplicates, but I need someone to review my script before I try it out, just to be safe.
Here's what I've got:
The file has one item per line, and I want there to be no empty lines where items are removed. It is okay if there is a blank like at the beginning and/or end of the file.Code:#!/usr/bin/perl -w # Try to get the wordlist file ready for use. $filepath = "wordlist.txt"; sysopen (wl, +>, "$filepath") or die "$filepath cannot be opened."; while (<wl>) { $checkforthis="\n"."$_"; while (<wl>) { s/"$chekforthis"/"\n"; } } #EOF
So, will this work? Or would it crash my system if I tried running it? Please keep in mind this would be my first Perl script that I wrote myself, so I will probably need specifics if there is anything wrong with it (which I am sure that there are many things wrong with it - I just don't know what yet).
- 03-30-2009 #2
Well, I tried it out, and made some changes, but I'm still stuck. Here's what I've got now:
And here's the output when I try to run it:Code:#!/usr/bin/perl -w # Try to get the wordlist file ready for use. $filepath = "wordlist.txt"; sysopen(WL, $filepath1, O_RDRW) or die "$filepath cannot be opened."; while (<WL>) { $checkforthis="\n"."$_"; while (<WL>) { s/"$chekforthis"/"\n"/; } } sysclose(WL) #EOF
I looked here, but I still can't figure out what's wrong. It's driving me crazy...Code:Name "main::filepath1" used only once: possible typo at ./perl-fix.pl line 7. Name "main::chekforthis" used only once: possible typo at ./perl-fix.pl line 14. Name "main::checkforthis" used only once: possible typo at ./perl-fix.pl line 11. Argument "O_RDRW" isn't numeric in sysopen at ./perl-fix.pl line 7. Use of uninitialized value in sysopen at ./perl-fix.pl line 7. wordlist.txt cannot be opened. at ./perl-fix.pl line 7.
Anyone?
- 03-31-2009 #3
First off, the way to remove all duplicates from a wordlist with one word per line in Bash is:
"sort" does exactly what you expect. "uniq" takes a list of words and turns any run of the same word into exactly one. So:Code:sort wordlist.txt | uniq
becomesCode:one one two
butCode:one two
stays the same.Code:one two one
Now, to go to your code:
First off, don't use sysopen. I've heard of that function, but I have no idea what it does. It probably requires some sort of module anyway (I think it requires Fcntl). Use regular "open" instead.
Secondly, you have misspelled the name of your variable inside the second while loop.
Thirdly, this simply won't work. Modifying $_ doesn't change the contents of a file.
In Perl, the standard way to do duplicate detection is to use a hash. Therefore, if I was implementing this, I would do it this way:
What this does is loop through the file and make an entry in our hash for each word. Therefore, we end up with a hash whose keys are all of the words in the file. However, a hash treats duplicate keys as the same, so when we take the keys of the hash, we find a single copy of every word that was in the file. We write these out, and all duplicates have been removed.Code:#!/usr/bin/perl -w use strict; my $filepath = "wordlist.txt"; open(my $file, "<", $filepath) or die "$filepath cannot be opened."; my %words; while(<$file>) { $words{$_} = 1; } print "$_\n" for(keys %words);
Does this make sense?DISTRO=Arch
Registered Linux User #388732
- 03-31-2009 #4
Thanks so much for all that information! I do have a little bit of figuring out how this code works, though. I have added what it looks to me like it would do; could you please verify/correct my comments, just so I can better understand the code? (I'm trying to learn programming - kind of a rough road for me...)
I have put question marks next to the ones that I am more confused about.
Thanks again for your time, effort, and willingness to help.
- 04-01-2009 #5
Okay, I've run into another problem. It just erases the contents of "wordlist.txt". Here's what I've got now:
The same thing happens if I add:Code:#!/usr/bin/perl -w # Try to get the wordlist file ready for use. use strict; my $filepath = "wordlist.txt"; open(my $file, "+>", $filepath) or die "$filepath cannot be opened."; my %words; while(<$file>) { $words{$_} = 1; } print "$_\n" for(keys %words); #EOF
just above the "#EOF". Any ideas?Code:close($file)
- 04-01-2009 #6
So, how does open() work?
$filehandle is the handle that will be connected to the file. r/w is the direction of the handle. "<" means read, ">" means write, and ">>" means append. If you choose ">", you will clear the file!Code:open($filehandle, r/w, $filename)
Now, on to your questions about the code I posted:
[1] What this line does is essentially create a key in the hash for each word. We don't care about the value. We are basically reading every word into memory, but only remembering a certain word once.Code:my %words; # initialize the hash while(<$file>) # while we are not at the end of the file { $words{$_} = 1; # for each key in %words, set the value to 1 (???) [1] } print "$_\n" for(keys %words); # print each of the keys in %words; don't print the values (?) [2]
Think about the naive way of reading a word into memory. We would have an array, and push each word onto it:
However, this remembers every word. Remember that no matter how many times you assign a value to a given hash key, the key only exists once (future writes to that key will overwrite the existing value). So let's follow my code execution through the following word list:Code:while(<$file>) { push @words, $_; }
First, we are going to set $words{'one'} = 1. %words now has one key: "one". Then we do the same for "two" and "three". Now %words has three keys: "one", "two", and "three".Code:one two three two two one
Now we set $words{'two'} = 1. Well, %words already has a key "two". So there is no change. And similarly for the next 2 lines.
After running this, if I print out the keys, I will get "one", "two", and "three", despite the fact that some of these appeared multiple times.
This line may be obvious now, but remember that we don't care about the values in this hash. We only care about the keys. So we're only printing those out.Code:print "$_\n" for(keys %words);
Does this make more sense?DISTRO=Arch
Registered Linux User #388732
- 04-01-2009 #7
First of all, yes, I think that makes a lot more sense now. Thanks!
Next, yes, I misread the reference tutorial - I should have had "+<" instead of "+>".
So, I replaced the ">" with a "<", and ran the script. This time, it hung for a few seconds, (as I would expect with a 37.4 MB file,)and then spit out text for about a minute or so. However, the text had blank lines in between the entries (not what I hoped for) and left the original file unmodified.
So, not to be a pain, but now my questions are:
1. How do I get rid of the blank lines? (I think "chomp" might come in handy, but I don't know how...)
2. How do I make it save the new list back to the file? (That is, all the entries once, excluding duplicates.)
Thanks again for your time. I think I'm learning a lot.
- 04-09-2009 #8
No worries about being a pain. I've been doing Perl for a long time, and I've had to learn all of this stuff as well.
1) You are correct. What is happening here is that we are reading each line of the file. Each line ends with a newline (obviously). However, when we print out the keys at the end, we are still appending a newline. So we end up with a newline followed by a newline: a blank line.
chomp() will remove the last character of the given variable if that character is a newline. If it is not a newline, it has no effect. If no variable is given, it works on $_. So we modify our code:
That's the only change. It now chomps the line, and we use this new chomped line as our key (which is really what we wanted: now the key is a word, not a word followed by a newline).Code:#!/usr/bin/perl -w use strict; my $filepath = "wordlist.txt"; open(my $file, "<", $filepath) or die "$filepath cannot be opened."; my %words; while(<$file>) { chomp; $words{$_} = 1; } print "$_\n" for(keys %words);
2) There are two approaches to this.
Most utilities in UNIX will read from stdin and write to stdout by default. This way, the person who runs your program gets to choose where the input comes from and where the output goes to. In this example, we have hardcoded the input, but we print the output to stdout. Now, to direct it to a file, you run the program with:
This is a Bash command which means to direct all output to the given file. It has nothing at all to do with Perl.Code:./remove_duplicates > no_duplicates_wordlist.txt
The other approach is to hardcode the output as well (or take it as an argument to the script, etc.). Suppose we want to output to the file "no_duplicates_wordlist.txt". We do the following:
We open the file for output just like we open a file for input, just with the output indicator (">"). The print() command then takes a hidden argument: if the first argument is a filehandle, then print to that. In fact, if you don't specify a filehandle, "STDOUT" is given implicitly.Code:#!/usr/bin/perl -w use strict; my $filepath = "wordlist.txt"; open(my $file, "<", $filepath) or die "$filepath cannot be opened."; my %words; while(<$file>) { chomp; $words{$_} = 1; } open(my $outfile, ">", "no_duplicates_wordlist.txt") or die "no_duplicates_wordlist.txt cannot be opened for output.\n"; print {$outfile} "$_\n" for(keys %words);
Does this make sense?DISTRO=Arch
Registered Linux User #388732
- 04-09-2009 #9Linux User
- Join Date
- Aug 2006
- Posts
- 458
- 04-13-2009 #10
Yeah, I was wondering about the chomp thing, or if I should not chomp, but just remove the "\n" part.
Anyhow, here is my final (and working) script:
Thanks a bunch! Oh, and I'm sure this won't be the last you see of me.Code:#!/usr/bin/perl -w # Try to get the wordlist file ready for use. use strict; my $filepath1 = "wordlist1.txt"; my $filepath2 = "wordlist2.txt"; open(my $filehandle1, "+<", $filepath1) or die "$filepath1 cannot be opened."; my %words; while(<$filehandle1>) { chomp; $words{$_} = 1; } close($filehandle1); open(my $filehandle2, ">", $filepath2) or die "$filepath2 cannot be opened."; print {$filehandle2} "$_\n" for(keys %words); close($filehandle2); #EOF


Reply With Quote
