Find the answer to your Linux question:
Results 1 to 9 of 9
Hello all: Long time linux user here. I started a new job and they have a Linux (CentOS 5.3 32bit) box with one disk (looks like it's a RAID 1) ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Jul 2010
    Posts
    4

    backing up 300,000 files


    Hello all:
    Long time linux user here.
    I started a new job and they have a Linux (CentOS 5.3 32bit) box with one disk (looks like it's a RAID 1) for everything that has over 300,000 files in one directory. I am working on backups (backup to local disk using dump and copy dump files to remote server) and there is no other disk on another controller available. If I try to count the number of files (ls */*/*/*|wc -l) it chokes and errors out. Any suggestions?

    Alex DeWolf
    San Diego, California

  2. #2
    Trusted Penguin Irithori's Avatar
    Join Date
    May 2009
    Location
    Munich
    Posts
    3,356
    Code:
    find /DIRECTORY -type f |wc -l
    will count the files.

    On backup:
    Depends, how big you plan.
    Is rsync enough?
    For a network wide backup I usually recommend bacula.
    You must always face the curtain with a bow.

  3. #3
    Just Joined!
    Join Date
    Jul 2010
    Posts
    4
    Got that. I am not sure the best way to back this up with the least I/O impact on the server. This is production.

  4. #4
    Trusted Penguin Irithori's Avatar
    Join Date
    May 2009
    Location
    Munich
    Posts
    3,356
    300000 doesnt sound too scary.
    My biggest number of files per server is 40million (although in a directory structure, not all in one dir)
    You must always face the curtain with a bow.

  5. #5
    Just Joined!
    Join Date
    Jul 2010
    Posts
    4
    How did you back it up? Was it a production server where I/O was an issue? Did you use an external (USB) device (disk drive or tape drive)?

  6. #6
    Linux Guru
    Join Date
    Nov 2007
    Posts
    1,754
    When you get into the millions-of-files volume, snapshots and "raw disk" (imaging) backups during off hours will get the best throughput.

    The storage location is irrelevant as long as it can write at the speeds you require.

  7. #7
    Trusted Penguin Irithori's Avatar
    Join Date
    May 2009
    Location
    Munich
    Posts
    3,356
    Quote Originally Posted by HROAdmin26 View Post
    When you get into the millions-of-files volume, snapshots and "raw disk" (imaging) backups during off hours will get the best throughput.
    That is of course correct.

    But dependend on your hardware´s capabilities you might be able to implement "regular" backup as well.
    A raid 10 over 24x 146GByte 15k harddiscs with a decent controller is surely more performant than a raid5 with 3x 750Gbyte 7.2k and a consumer grade controller.

    So -in my case- we opted for the "kill the problem with hardware" option,
    just to have a consistent way of backuping our network.
    Yes, this is a (one of two redundant actually) production server
    and yes, it is backuped during operation.


    Bottom line:
    If your hardware can take it, rsync or better bacula are viable options for backing up these 300000 files
    You must always face the curtain with a bow.

  8. #8
    Just Joined!
    Join Date
    Jul 2010
    Posts
    4
    These machines are in a colocation and have only one disk controller each, so the one pipe analogy applies here. If I ran the hardware I would make sure there was 2 RAID controllers and one fibre controller each.
    I have been checkinto rsync, I will check into bacula.

    Thanks

  9. #9
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,452
    Quote Originally Posted by HROAdmin26 View Post
    When you get into the millions-of-files volume, snapshots and "raw disk" (imaging) backups during off hours will get the best throughput.

    The storage location is irrelevant as long as it can write at the speeds you require.
    File system types are also important here. Some choke with too many files. Others are OK with large numbers of files in a single directory.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •