Page 3 of 3 FirstFirst 123
Results 21 to 22 of 22

Thread: Twins v0.4 released.

  1. #21
    Join Date
    Jul 2005
    Beans
    78

    Re: Twins v0.4 released.

    Well, like what's already been noted, there are a ton of apps that already exist to do this, but if you want to keep going, hell why not, it's good experience either way.

    A big suggestion would be not to hash the whole file unless you need to. Hashes take a while to run, especially across larger files, and it's more advantageous to scan more files than to scan whole files on the first pass. I had to write a similar program in C a long time ago, and I just opened the file, hashed the first 8K and stuck that in the hash table. If there were any collisions, at that point you do the whole hash check just to be sure. Most files (images, songs, etc) will show a delta in the first two blocks. The only ones that won't are files such as source code, where the full-file hash check will catch them. It should shave quite a bit of time off the current run-time of the script. You could go with an adjustable block-size too to tune your hash performance.

    I'd write the patch for you but Python is Greek to me.

  2. #22
    Join Date
    May 2007
    Beans
    77

    Re: Twins v0.4 released.

    Quote Originally Posted by Georges View Post
    for syncing 2 directories use unison
    Code:
    sudo apt-get install unison unison-gui
    unison-gui
    by default unison does not sync owner, mode and times. You need to tell it to do so.
    I am not trying to sync directories. I want to know if two directories are already identical, and if they are, I can delete one of the two. Syncing means that you are intentionally trying to make them identical. It appears I have download the same folders multiple times, and if Twins could find out that I have the same folder on my computer on different locations, that'd be great.

    for finding duplicates... there are already many utilities.
    e.g:
    http://www.pixelbeat.org/fslint/
    Methods used to identify duplicate files are probably not as good as using md5 to see if they are truly identical. More importantly, when moving files, it is good to check md5 to see if the moved file is identical to the original file (since all sorts of things can happen during the move).

    The idea is not to find duplicates, but to ensure file integrity.

    Also, FSLint does not have an ideal graphic user interface.

Page 3 of 3 FirstFirst 123

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •