Sounds like a mess.
1 copy isn't a backup.
For each file, which storage location is the "storage of record?" That is what you should backup. If there is only 2 copies, it isn't a backup in my mind.
* Original, production copy.
* Versioned, backup on local storage (rdiff-backup to get versioned data and permissions)
* Versioned, backup on off-site storage (rsync from the local backup)
Any offsite backups must be encrypted. Never trust the encryption layer provided by the backup vendor. Always do it yourself with LUKS or gpg or openssl.
To clean up the production source copies, use a file deduplication tool like fdupes. There are about 10 of these tools. I wrote one a few years ago in perl after seeing a question on /. Mine is a hack and just makes a list of duplicate files after running a few different types of comparisons to ensure they really are duplicates regardless of the filename. Turned out that list still creates a huge hassle to cleanup, at least for me.
The manpage for fdupes:
Code:
NAME
fdupes - finds duplicate files in a given set of directories
SYNOPSIS
fdupes [ options ] DIRECTORY ...
DESCRIPTION
Searches the given path for duplicate files. Such files are found by
comparing file sizes and MD5 signatures, followed by a byte-by-byte
comparison.
Having a list of 2,000 files with duplicates should be expected.
Bookmarks