Page 1 of 3 123 LastLast
Results 1 to 10 of 21

Thread: SCP transfer of many small files over LAN and internet

  1. #1
    Join Date
    Dec 2009
    Beans
    554

    SCP transfer of many small files over LAN and internet

    Hi all,

    I have to continually transfer about 10-20 files (each about 10 kbyte to a maximum of 1 Mbyte) from two servers over LAN or over internet. The scenario is the following: the server A produces a set of files and the server B reads these files. The production and reading phases are independents since that each file has it own sequence number. Each time that the server B has completed to read a group of files, it passes to the following group. I need authentication and encryption. At the moment I'm using scp combined with different option for encryption and compression:

    Code:
    scp -c arcfour (blowfish) -C your_username@rh1.edu:/some/remote/directory/foobar.txt your_username@rh2.edu:/some/remote/directory/
    My question are: 1) by combining the 10-20 scp command in unique command, can I speed the transfer? 2) I read that the sftp is not good alternatives for small files. 3) This project http://www.psc.edu/index.php/hpn-ssh appears actractive. Have you tried it?

    Any suggestions is welcome.
    Thank you
    Last edited by erotavlas; November 18th, 2013 at 10:25 AM.

  2. #2
    Join Date
    Sep 2006
    Beans
    8,627
    Distro
    Ubuntu 14.04 Trusty Tahr

    batch mode

    Where did you read that sftp was not good for small files? I would think that especially in batch mode (-b) using a key, it would be quite efficient.

    Otherwise, if you are going to use scp instead and are calling it 10 - 20 times, you might look into multiplexing. The master connection will be normal speed, but then all the subsequent connections reusing that master are much faster.

    However, short of new information or multiplexing, I would figure sftp the better one.

  3. #3
    Join Date
    Dec 2009
    Beans
    554

    Re: batch mode

    Here I read that scp is faster than sftp http://superuser.com/questions/13490...n-scp-and-sftp. But here http://laurentschneider.com/wordpres...cp-tuning.html the author show the converse but it is transferring an huge file. I have read your link about multiplexing. In the case of scp, what is better to do one scp command and transfer all set of files or to use multiplexing and open one master connection and several sub connections one for each file?
    In the case of sftp, what is the answer to the previous question?

    What do you suggest to achieve the fastest solution?

    Thank you

  4. #4
    Join Date
    Sep 2006
    Beans
    8,627
    Distro
    Ubuntu 14.04 Trusty Tahr

    Re: SCP transfer of many small files over LAN and internet

    Thanks for the links. I'm going to have to look further into that and see what else is written about performance.

    About one or many SCP connections, it looks like the fastest option may be to just use one scp connection.

    Code:
    scp ./* server.example.org:/some/path/.
    I wish I knew more about how the connections work, then it would be easy to predict the fastest method. Specifically, if scp uses one tcp connection, then there is no real advantage of multiplexing. I've done some time trials but not enough to deduce what is going on behind the scenes. But it did look like SFTP was slower by a small amount in proportion to the amount transfered. You can do some time trials with your actual data set and see how much is actually used.

    Code:
    time scp ./* server.example.org:/some/path/.
    Then do the same for sftp in batch mode.

    Code:
    time sftp -b upload.batch server.example.org:/some/path/
    For batch mode, you'll need to use a key.

  5. #5
    Join Date
    Sep 2006
    Beans
    8,627
    Distro
    Ubuntu 14.04 Trusty Tahr

    rsync

    If the files don't change completely, then I would use rsync instead. That only transfers the changes so if the files are mostly unchanged, it will be much, much faster than scp or sftp. It works over ssh and can be configured to use keys.

  6. #6
    Join Date
    Sep 2006
    Beans
    8,627
    Distro
    Ubuntu 14.04 Trusty Tahr

    Re: SCP transfer of many small files over LAN and internet

    Whichever method you use, it will be faster by upgrading to the blowfish cipher. Here is an example copying 32 x 1MB files:

    Code:
    	real	user	sys	
    sftp batch				
    	16.332	0.442	0.333	
    	16.495	0.457	0.354	
    	16.926	0.434	0.338	
    	16.584	0.444	0.342	avg
    sftp blowfish				
    	11.541	1.409	0.313	
    	11.090	1.436	0.324	
    	10.824	1.443	0.326	
    	11.152	1.429	0.321	avg
    scp blowfish				
    	9.214	1.232	0.211	
    	9.962	1.007	0.157	
    	11.084	1.283	0.209	
    	10.087	1.174	0.192	avg
    scp plain				
    	15.865	0.386	0.234	
    	15.765	0.370	0.227	
    	15.277	0.369	0.224	
    	15.636	0.375	0.228	avg
    So far, ssh still defaults to 3DES. So you have to specify blowfish manually or else add it to ~/.ssh/config.
    Last edited by Lars Noodén; November 13th, 2013 at 04:52 PM.

  7. #7
    Join Date
    Dec 2009
    Beans
    554

    Re: SCP transfer of many small files over LAN and internet

    Hi,

    have you read my first post? I have tried with blowfish and arcfour but it seems to make no difference. Now I'm trying to make one single scp with all files instead of one scp of each file with multiplexing. I don't know where is the problem, actually I'm programming in Java and I'm using jsch library http://www.jcraft.com/jsch/examples/.
    Thank you for your example.

  8. #8
    Join Date
    Nov 2008
    Location
    Boston MetroWest
    Beans
    16,326

    Re: rsync

    Quote Originally Posted by Lars Noodén View Post
    If the files don't change completely, then I would use rsync instead. That only transfers the changes so if the files are mostly unchanged, it will be much, much faster than scp or sftp. It works over ssh and can be configured to use keys.
    Another vote for rsync, even if the files do change completely. Have you given it a try yet, erotavlas?

    Another option is to run a script that creates a compressed tarball of the files, ships the tarball via scp, then runs a command on the remote via ssh to unpack the files.

    Yet another route is to use OpenVPN to set up an encrypted tunnel between the two machines and use NFS over the tunnel to mount a directory on the source machine on the remote.
    Last edited by SeijiSensei; November 13th, 2013 at 04:41 PM.
    If you ask for help, do not abandon your request. Please have the courtesy to check for responses and thank the people who helped you.

    Blog · Linode System Administration Guides · Android Apps for Ubuntu Users

  9. #9
    Join Date
    Sep 2006
    Beans
    8,627
    Distro
    Ubuntu 14.04 Trusty Tahr

    java

    Quote Originally Posted by erotavlas View Post
    ...actually I'm programming in Java and I'm using jsch library http://www.jcraft.com/jsch/examples/.
    Then this might not be an ssh/scp/sftp question at all and might be a JSch question. From the web page, it appears to be an independent implementation of SSH2 and not interact with the regular OpenSSH tools at all. So, you might post this over in the programming section to get input from java users and help with that module.

  10. #10
    Join Date
    Dec 2009
    Beans
    554

    Re: java

    Quote Originally Posted by Lars Noodén View Post
    Then this might not be an ssh/scp/sftp question at all and might be a JSch question. From the web page, it appears to be an independent implementation of SSH2 and not interact with the regular OpenSSH tools at all. So, you might post this over in the programming section to get input from java users and help with that module.
    You are right but before to proceed with programming I need to know what is the best solution. I made this bash script and I discover that there is no difference between with scp: in particular sending 10 files by using 10 scp commands in parallel and 1 single command with 10 files require the same time. Now I'm trying to the same thing with sftp.

    [Updates]
    In my case it seems that there is no differences between scp and sftp. You can try the script.

    Thank you

    Code:
    #!/bin/bash
    
    USER="foo"
    REMOTE_PATH="/home/foo/"
    REMOTE_IP="10.1.2.78"
    PORT="22"
    LOCAL_PATH="/home/foo/"
    LOCAL_DATA=$LOCAL_PATH"Desktop/provaSCP/"
    CIPHER=arcfour #blowfish
    BATCH_FILE="batch_file"
    
    # do only one time
    printf "ssh -fN -o \"ControlMaster=yes\" -o \"ControlPersist=yes\" -o \"ControlPath=$LOCAL_PATH.ssh/$USER@$REMOTE_IP:$PORT\"  $USER@$REMOTE_IP"
    ssh -fN -o "ControlMaster=yes" -o "ControlPersist=yes" -o "ControlPath=$LOCAL_PATH.ssh/$USER"@"$REMOTE_IP:$PORT"  $USER@$REMOTE_IP
    
    ####################### scp each file
    STARTTIME="$(date +%s%N)"
    # loop over files
    for file in $LOCAL_DATA* 
    do
        #printf "scp -o \"ControlPath=$LOCAL_PATH.ssh/$USER@$REMOTE_IP:$PORT\" -c $CIPHER -C $file $USER@$REMOTE_IP:$REMOTE_PATH &\n"
        scp -o "ControlPath=$LOCAL_PATH.ssh/$USER@$REMOTE_IP:$PORT" -c blowfish -C $file $USER@$REMOTE_IP:$REMOTE_PATH &
    done
    wait
    ENDTIME="$(date +%s%N)"
    printf "One scp for each file takes $((($ENDTIME - $STARTTIME)/1000000000)) seconds\n\n"
    
    ############################ scp all file
    # loop over files
    files=""
    for file in $LOCAL_DATA* 
    do 
        #printf "$file"
        files=$files" $file "
    done
    STARTTIME="$(date +%s%N)"
    #printf "scp -o \"ControlPath=$LOCAL_PATH.ssh/$USER@$REMOTE_IP:$PORT\" -c $CIPHER -C $files $USER@$REMOTE_IP:$REMOTE_PATH"
    scp -o "ControlPath=$LOCAL_PATH.ssh/$USER@$REMOTE_IP:$PORT" -c blowfish -C $files $USER@$REMOTE_IP:$REMOTE_PATH
    ENDTIME="$(date +%s%N)"
    printf "One scp for all files takes $((($ENDTIME - $STARTTIME)/1000000000)) seconds\n\n"
    
    ############################ sftp all file
    # delete file
    rm -f $BATCH_FILE
    echo "put "$LOCAL_DATA* >> $BATCH_FILE
    
    STARTTIME="$(date +%s%N)"
    printf "sftp -o \"ControlPath=$LOCAL_PATH.ssh/$USER@$REMOTE_IP:$PORT\" -c $CIPHER -C -b $BATCH_FILE -P $PORT $REMOTE_IP:$REMOTE_PATH"
    sftp -o "ControlPath=$LOCAL_PATH.ssh/$USER@$REMOTE_IP:$PORT" -c $CIPHER -C -b $BATCH_FILE -P $PORT $REMOTE_IP:$REMOTE_PATH
    ENDTIME="$(date +%s%N)"
    printf "One sftp for all files takes $((($ENDTIME - $STARTTIME)/1000000000)) seconds\n"
    Last edited by erotavlas; November 14th, 2013 at 04:25 PM.

Page 1 of 3 123 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •