I recently had to merge an old backup onto a new drive, and of course ended up with a large number of duplicate files scattered all over the new drive. By digging around, I discovered the program "fdupes" which will run through a directory tree and find all the duplicates by calculating the MD5 checksums and comparing these. This is great, as it means that even if the file names are different, it will find it. It even has a -d switch, which will ask you which file to keep and which to delete.
However...... with roughly 12,000 duplicates in my directory tree, I was not going to sit there and answer the same question 12,000 times. So I wrote a little Python program to do the dirty work.
Here's what I did, after installing fdupes via Synaptic:
Code:
fdupes -r /home/top_dir > xdupes.txt
This will run through the directory tree starting at top_dir and output its results into the file xdupes.txt.
Next, I ran this little Python program:
Code:
import sys
text_file=open("xdupes.txt","r")
lines=text_file.readlines()
text_file.close()
text_file=open("xdupes_rm.sh","w")
my_count=len(lines)
for i in range(my_count):
next_count = i + 1
if next_count == my_count:
sys.exit() # escape hatch
else:
if len(lines[next_count]) > 1: # next line is not blank, so rm this line
out_line1 = lines[i].rstrip() # remove trailing \n
out_line = 'rm ' + '"' + out_line1 + '"' + '\n'
#add a \n after the closing quote
if len (out_line) > 6: # don't write out rm ""\n, Nigel!
text_file.write(out_line)
text_file.close()
This then creates an output file named "xdupes_rm.sh" that contains all the duplicate file names read in from "xdupes.txt", except the LAST name. For example, if fdupes has found files A, B, and C to be the same, then the file names A and B will be written to "xdupes_rm.sh".
Next, make it executable:
Code:
chmod 777 xdupes_rm.sh
If you are unsure, use a text editor to check xdupes_rm.sh and xdupes.txt, and check that not all files will be removed.
Now run the removal script:
Note: With a large number of files it can take a LONG time. Fortunately, fdupes five you visual feedback, so you know it's working.
Bookmarks