Page 1 of 3 123 LastLast
Results 1 to 10 of 23

Thread: Check if all data it was well copy

  1. #1
    Join Date
    Apr 2016
    Beans
    356

    Check if all data it was well copy

    Hi guys,

    Exist another secure way to copy data between discs on server ubuntu beyond the command "cp -arvp /home/user/disk1/ /home/user/disk2/"?

    Because after copy all data I checked if all data it was copy through command "du -sh folder/" and button right and properties (see report on the file manager, in this case the nautilus) and sometimes the result don't match between source and destiny (both the number of files and the disk space).

    Thanks

  2. #2
    Join Date
    Mar 2010
    Location
    Squidbilly-Land
    Beans
    Hidden!
    Distro
    Ubuntu

    Re: Check if all data it was well copy

    rsync is one. In the list of great human inventions, rsync is up there with "Unix", ssh, vaccines, and the printing press. It is THAT good.

    but most file systems don't actually validate that the data sent to be written actually got onto the disk correctly. I know only 1 file system which does verify - ZFS. It wants ECC RAM to do this validation.

  3. #3
    Join Date
    Apr 2016
    Beans
    356

    Re: Check if all data it was well copy

    Quote Originally Posted by TheFu View Post
    rsync is one. In the list of great human inventions, rsync is up there with "Unix", ssh, vaccines, and the printing press. It is THAT good.

    but most file systems don't actually validate that the data sent to be written actually got onto the disk correctly. I know only 1 file system which does verify - ZFS. It wants ECC RAM to do this validation.
    But with rsync occur the same scenario. There is always a difference between file number.

  4. #4
    Join Date
    Nov 2011
    Location
    /dev/root
    Beans
    Hidden!

    Re: Check if all data it was well copy

    I would run the copy command with elevated permissions in order to copy also from and to files/directories where the current user may not have permissions and to preserve the permissions. And I prefer rsync, but cp should work too, when copying locally.

    For example, you can try

    Code:
    sudo rsync -Havn source/ target
    which is a 'dry run' and when things look OK remove the n and do the real work.

    You will find a lot of options in man rsync

    If necessary you can make sure that the source and target are identical with the rsync option -c, but it will make things much slower.

    Code:
            -c, --checksum
                  This changes the way rsync checks if the files have been changed
                  and  are in need of a transfer.  Without this option, rsync uses
                  a "quick check" that (by default) checks if each file’s size and
                  time of last modification match between the sender and receiver.
                  This option changes this to compare a 128-bit checksum for  each
                  file  that  has a matching size.  Generating the checksums means
                  that both sides will expend a lot of disk I/O  reading  all  the
                  data  in  the  files  in  the transfer (and this is prior to any
                  reading that will be done to transfer changed  files),  so  this
                  can slow things down significantly.
    
                  The  sending  side generates its checksums while it is doing the
                  file-system scan that builds the list of  the  available  files.
                  The  receiver  generates  its  checksums when it is scanning for
                  changed files, and will checksum any file that has the same size
                  as the corresponding sender’s file:  files with either a changed
                  size or a changed checksum are selected for transfer.
    
                  Note that rsync always verifies that each transferred  file  was
                  correctly  reconstructed  on  the  receiving  side by checking a
                  whole-file checksum that is generated  as  the  file  is  trans‐
                  ferred,  but  that automatic after-the-transfer verification has
                  nothing to do with this option’s before-the-transfer "Does  this
                  file need to be updated?" check.
    
                  For  protocol  30  and  beyond  (first  supported in 3.0.0), the
                  checksum used is MD5.  For older protocols, the checksum used is
                  MD4.
    
           -a, --archive
                  This  is equivalent to -rlptgoD. It is a quick way of saying you
                  want recursion and want to preserve almost everything  (with  -H
                  being  a  notable  omission).   The  only exception to the above
                  equivalence is when --files-from is specified, in which case  -r
                  is not implied.
    
                  Note that -a does not preserve hardlinks, because finding multi‐
                  ply-linked files is expensive.  You must separately specify -H.

  5. #5
    Join Date
    Mar 2010
    Location
    Squidbilly-Land
    Beans
    Hidden!
    Distro
    Ubuntu

    Re: Check if all data it was well copy

    Quote Originally Posted by sed_faster View Post
    But with rsync occur the same scenario. There is always a difference between file number.
    4 possible things.
    * different block sizes on the disks
    * permissions don't allow the userid running the command to copy all the data
    * symbolic links
    * using sparse files

    If one disk has 512b blocks and the other has 4Kb blocks, then the sizes will be different between the source and target. The smallest file will use 4Kb, not 512b - that's 3x larger.

    Permissions issues - too large a topic to cover, but if there is any confusion about permissions, that could be the problem.

    Sparse files can be used by some programs to create relatively small files on disk, that can grow to huge sizes. Many commands have a sparse file option to ensure proper handling during copies.
    The cp manpage:
    Code:
           By  default, sparse SOURCE files are detected by a crude heuristic and the corresponding DEST
           file is made sparse as well.  That  is  the  behavior  selected  by  --sparse=auto.   Specify
           --sparse=always  to create a sparse DEST file whenever the SOURCE file contains a long enough
           sequence of zero bytes.  Use --sparse=never to inhibit creation of sparse files.
    The rsync manpage:
    Code:
           -S, --sparse
                  Try  to handle sparse files efficiently so they take up less space on the destination.
                  If combined with --inplace the file created might not end up with sparse  blocks  with
                  some combinations of kernel version and/or filesystem type.  If --whole-file is in ef‐
                  fect (e.g. for a local copy) then it will always work because rsync truncates the file
                  prior to writing out the updated version.
    
                  Note  that  versions of rsync older than 3.1.3 will reject the combination of --sparse
                  and --inplace.
    There are other references to sparse file handling in both manpages.

    The way that symbolic links are handled will matter too. They can be retained as links or they can be "followed", which could pull much more data into the backup - possibly creating a circular link.

  6. #6
    Join Date
    Apr 2016
    Beans
    356

    Re: Check if all data it was well copy

    UPDATE:
    I will try this command "sudo rsync -HaS --progress source target". Which command I can try to check the difference between folders? So I test the veracity of the two commands.

    Maybe the problems be on the different block sizes between on the disks
    Last edited by sed_faster; June 28th, 2021 at 02:31 PM.

  7. #7
    Join Date
    Nov 2011
    Location
    /dev/root
    Beans
    Hidden!

    Re: Check if all data it was well copy

    You can check with

    Code:
    sudo -Havcn source/ target
    c forces checksum test of each file, n dry run (that is only check, no copying). The list of files will show which files are not matching (and should be copied).

    But it does not show files on the target side, which are not matched on the source side. You can check that with a second test,

    Code:
    sudo -Havcn target/ source
    but if synchronizing is what you want, there are better tools, for example unison (and unison-gtk).
    Last edited by sudodus; June 28th, 2021 at 02:36 PM.

  8. #8
    Join Date
    Mar 2010
    Location
    Squidbilly-Land
    Beans
    Hidden!
    Distro
    Ubuntu

    Re: Check if all data it was well copy

    -H follows symlinks. That might not be desirable.
    Be 100% certain you understand every option.

    To see what got out of data, I'd re-run the rsync over, just with --dry-run. That will show what it thinks is out of date. Be 100% certain the systems have the correct date/time. That's important to rsync and for security in general. Also, if the target keeps growing in size, understand that rsync doesn't delete any files not in the source, but still in the target directories. There are multiple delete option, if the goal is to have a data mirror.

    Without a delete option, new files will be added, but files not in the source, will never be deleted in the target. cp has the same consideration.

  9. #9
    Join Date
    Nov 2011
    Location
    /dev/root
    Beans
    Hidden!

    Re: Check if all data it was well copy

    I thought -H [in rsync] is only checking for hard-links (not symbolic links).

    Code:
           -H, --hard-links
                  This  tells rsync to look for hard-linked files in the source and link together
                  the corresponding files on the destination.  Without this option, hard-linked
                  files in the source are treated  as  though  they were separate files.
    
                  ...
    Last edited by sudodus; June 28th, 2021 at 02:46 PM.

  10. #10
    Join Date
    Mar 2010
    Location
    Squidbilly-Land
    Beans
    Hidden!
    Distro
    Ubuntu

    Re: Check if all data it was well copy

    -H ... my mistake. -H (or is that -h) in cp means follow symlinks. From the cp manpage:
    Code:
           -H     follow command-line symbolic links in SOURCE
    I'm so confused! I need to fully understand all the options for the command, right?

Page 1 of 3 123 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •