wbloos
December 23rd, 2015, 11:47 AM
Hi,
I've made this little script that gets size and md5 hash for all files in one snapshot of our backup.
The reason is that the backup service uses some kind of hardlinks on a ZFS file system. That's great because it saves a lot of space, but it makes it hard to explain the volume of backup that we're using. Since we have 10 snapshots in rotation, files that change every day take up 10 times more space than files that don't change at all.
The hardlinks are not detected by du, all the files seem to take up full space. That's why a simple du command won't help.
#!/bin/bash
while IFS= read -r -d '' file; do
size=$(stat --printf="%s" "$file")
hash=$(md5sum "$file" | cut -f1 -d ' ')
printf "$size"\\t"$hash"\\t"$file"\\n
done < <(find /mnt/backup/.snapshots/`date +\%F`-*/ -type f -print0)
So i made the above bash script. I found some handy code snippets on the internet which made my script work pretty much the way i want, but there's a few things that i don't understand. Also, there are 2 problems:
Files that have a percent sign in the file name cause problems: the filename is truncated and there will be no newline.
The script takes very long to complete. I might be causing more IO than necessary. AFAIK, "read" will read the whole file. But probably, md5sum nor stat will use that information, will they?
Any help would be very much appreciated!
I've made this little script that gets size and md5 hash for all files in one snapshot of our backup.
The reason is that the backup service uses some kind of hardlinks on a ZFS file system. That's great because it saves a lot of space, but it makes it hard to explain the volume of backup that we're using. Since we have 10 snapshots in rotation, files that change every day take up 10 times more space than files that don't change at all.
The hardlinks are not detected by du, all the files seem to take up full space. That's why a simple du command won't help.
#!/bin/bash
while IFS= read -r -d '' file; do
size=$(stat --printf="%s" "$file")
hash=$(md5sum "$file" | cut -f1 -d ' ')
printf "$size"\\t"$hash"\\t"$file"\\n
done < <(find /mnt/backup/.snapshots/`date +\%F`-*/ -type f -print0)
So i made the above bash script. I found some handy code snippets on the internet which made my script work pretty much the way i want, but there's a few things that i don't understand. Also, there are 2 problems:
Files that have a percent sign in the file name cause problems: the filename is truncated and there will be no newline.
The script takes very long to complete. I might be causing more IO than necessary. AFAIK, "read" will read the whole file. But probably, md5sum nor stat will use that information, will they?
Any help would be very much appreciated!