Sunday, December 27, 2015

BASH merging wordlists

I'm taking a break from C++ because I'm at that point in my code folder where I keep some batch and python scripts. I thought this might be relevant. So here's something I did when I was working with big wordlists and I was just running out of RAM. I had to split the wordlists down into more manageable chunks, sort them and merge them back together.

#!/bin/bash
count=0
nums=$(ls -l | grep -v ^l | wc -l)
echo "Processing $(ls -l | grep -v ^l | wc -l) original files"

for i in $( ls ); do

 echo "Processing $i"

 # this will sort each text file alphnumerically and cut strings shorter than 8 characters and loner than 63 $i

((count ++))
echo "creating temp file"
mv $i temp.txt
echo "deleting strings containing more than 63 characters"
cat temp.txt | nawk '{str=$0; if (gsub(".", "") <= 63) print str}' > temp1.txt
echo "deleting temp file"
rm temp.txt
echo "deleting strings containing less than 8 characters"
cat temp1.txt | nawk '{str=$0; if (gsub(".", "") >= 8) print str}' > temp2.txt
echo "deleting temp file"
rm temp1.txt
echo "sorting file alphanumerically and deleting duplicates"
cat temp2.txt | sort -i -u > $i
echo "deleting temp file"
rm temp2.txt
echo "$count / $nums complete"
done
echo "merging files with alphanumeric sort and deleting duplicates"
echo "$( ls )"
echo "This will take some time"
cat *txt | sort -i -u > newmergedfile.txt
mkdir newdir
mv newmergedfile.txt newdir
echo "removing original text files"
rm *txt


echo "the new file contains $(wc -l < newdir/newmergedfile.txt) 8-63 character strings and is $(wc -c < newdir/newmergedfile.txt) bytes"

No comments:

Post a Comment