Find the answer to your Linux question:
Results 1 to 3 of 3
I am currently in the middle of a project where I need to move about 3 million files in the 32KB to 10MB range to a new server. I was ...
  1. #1
    Just Joined!
    Join Date
    Mar 2009
    Posts
    1

    Where is the bottleneck in copying files

    I am currently in the middle of a project where I need to move about 3 million files in the 32KB to 10MB range to a new server.

    I was planning to just copy these files to a USB attached hard drive or an EIDE drive installed to the IDE adapter on the existing server's motherboard.

    I am able to copy a 1GB file from the RAID5 array to the USB drive and the EIDE drive at an okay 30MB/sec. However; when I start copying the smaller files my throughput drops to about 3.3MB/sec.

    I have used tar, star, rsync, cpio, cp and none of them seem to get above that 3.3MB/sec barrier to the USB drive or the EIDE drive.

    I wrote a program yesterday to copy everything from one directory to another and I didn't attempt to preserve access times, user rights, or anything else. Just readdir, open, read/write, and close. No stat call. I was still only able to get a throughput of 3.5MB/sec doing that.

    Can anyone tell me where the overhead comes as far as opening the file and then reading the data? I know from my 1GB test that there is definitely enough bandwidth to get me to 30MB/sec, but I have to get past the overhead problem.

    Would it be of any benefit to have the main program loop read the directory and then create threads to open and copy the files?

    Thanks for any direction.

    Darrell

  2. #2
    Linux Enthusiast
    Join Date
    Aug 2006
    Location
    Portsmouth, UK
    Posts
    539
    I've run into this problem in the past.

    IO Performance drops the more files you have in a directory. And copying a single 1GB file will always be quicker than copying 1,024 1KB files ( a lot less operations required for one big file ).

    I didn't find a quick method to copy the files, an option maybe to "dd" the drive / partition from one server to the other as this won't require the OS to query the file tables.
    RHCE #100-015-395
    Please don't PM me with questions as no reply may offend, that's what the forums are for.

  3. #3
    Linux Newbie Ziplock's Avatar
    Join Date
    Jan 2009
    Location
    Adelaide
    Posts
    169
    It doesn't matter which way you do it (other than dd) because it has to access every file. Also, are you tar'ing directly to the USB device? This won't help. The quickest way to do it, if you have the space on the local drive, is to create the tarball on a local partition, then just copy the 1 file across the USB link. No easy way out of it unfortunately...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...