Find the answer to your Linux question:
Results 1 to 7 of 7
Like Tree1Likes
  • 1 Post By HROAdmin26
Hello I would implement a backup system in this enterprise. I already have experimented with backupPC. This works fine but i had some questions. We have a lot of live ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Mar 2013
    Location
    Belgium
    Posts
    20

    Question backup changing data


    Hello

    I would implement a backup system in this enterprise. I already have experimented with backupPC. This works fine but i had some questions.

    We have a lot of live data with means the file can change frequently. This files are big files (>2TB). Rsync make a md5 hash and see that the files is change. but now he need to transfer the whole file i guess. Is there something that he only transfer the change part of the files. I found something but it could lead to inconsistent back-up. Because it calculate the MD5 of each part of the file. This is a good idea but what if the file change just on the moment that rsync calculate the md5hash or copy paste some thing from one part of the file to another part.

    Is there a solution that i could just take a snapshot of something from the file on that moment and calculate the md5 of the snapshot?

    Thanks

  2. #2
    Linux Enthusiast scathefire's Avatar
    Join Date
    Jan 2010
    Location
    Western Kentucky
    Posts
    626
    You are referring to a process called asynchronous backups. Theoretically if you use the --inplace option, its supposed to handle block-level copying.

    Otherwise, there are some enterprise-level solutions that are sold out there. One company I know of is called EMC.
    linux user # 503963

  3. #3
    Linux Engineer
    Join Date
    Apr 2012
    Location
    Virginia, USA
    Posts
    883
    Quote Originally Posted by JackieJarvis View Post
    Hello
    This is a good idea but what if the file change just on the moment that rsync calculate the md5hash or copy paste some thing from one part of the file to another part.

    Is there a solution that i could just take a snapshot of something from the file on that moment and calculate the md5 of the snapshot?

    Thanks
    If the file part changes just after the hash, then it missed the cutoff and would be backed up during the next cycle. NBD.
    If you want to take a 'snapshot' of your data you can use LVM2, which allows for snapshots.

    Anyway, if you have individual files greater than 2 TB in size, you need to rethink how your application does storage.

  4. #4
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,452
    When dealing with files this big, you generally do NOT want to do backups, especially if they may change frequently. You want to use online replication with at least a 3x image factor (3 copies) and some kind of quorum manager that will decide if one of them is bad, and re-replicate it from one of the good files automatically and in the background. An example (read-only) of this is the hadoop distributed file system (hdfs). Read about it at http://hadoop.apache.org/.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  5. #5
    Linux Guru
    Join Date
    Nov 2007
    Posts
    1,754
    I think OP is confusing 2 unique operations - snapshots and backups. Each is a specific process to accomplish a specific task - and they aren't necessarily used together. When copying any data files while "in use", getting a "crash-consistent" picture of the data is important. This is a snapshot - it may be done by the application, by the filesystem, or at the block level. But each of these has to be understood and supported by the layers above them. (App > Filesystem > Block level) A block level snapshot, if not done correctly or maybe not supported by the application, will results in an application that doesn't work after replacing the data with the snapshot. There are far too many ways to get crash-consistent data to list them all here, but they depend on your application, filesystem, hardware platform, etc. (When no other options exist, the safest method is to stop the application, flush data buffers, copy/snapshot the data, and then start the application again.)

    When doing a "backup," ensuring the data is in a crash-consistent state is important (see section above.) The backup is nothing more than a copy of the data stored in a safe place - such as copying the snapshot. Sometimes multiple snapshots can be backups themselves, but only if done in a time-lagged manner. Continuous replication is not a backup. (A problem occurs in the production data, then is replicated to the 2ND copy = both copies of data are now bad and you're looking for a backup. This is why RAID is not a backup method.)

    * And from the rsync manual, it does a delta-level copy of changed files, not a complete copy of the file when only part of it has changed:

    It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination.
    Rubberman likes this.

  6. #6
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,452
    Quote Originally Posted by HROAdmin26 View Post
    I think OP is confusing 2 unique operations - snapshots and backups. Each is a specific process to accomplish a specific task - and they aren't necessarily used together. When copying any data files while "in use", getting a "crash-consistent" picture of the data is important. This is a snapshot - it may be done by the application, by the filesystem, or at the block level. But each of these has to be understood and supported by the layers above them. (App > Filesystem > Block level) A block level snapshot, if not done correctly or maybe not supported by the application, will results in an application that doesn't work after replacing the data with the snapshot. There are far too many ways to get crash-consistent data to list them all here, but they depend on your application, filesystem, hardware platform, etc. (When no other options exist, the safest method is to stop the application, flush data buffers, copy/snapshot the data, and then start the application again.)

    When doing a "backup," ensuring the data is in a crash-consistent state is important (see section above.) The backup is nothing more than a copy of the data stored in a safe place - such as copying the snapshot. Sometimes multiple snapshots can be backups themselves, but only if done in a time-lagged manner. Continuous replication is not a backup. (A problem occurs in the production data, then is replicated to the 2ND copy = both copies of data are now bad and you're looking for a backup. This is why RAID is not a backup method.)

    * And from the rsync manual, it does a delta-level copy of changed files, not a complete copy of the file when only part of it has changed:
    Well put HRO. I wan't aware of the fact that rsync only copies the deltas. Useful, for sure!
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  7. #7
    Just Joined!
    Join Date
    Mar 2013
    Location
    Belgium
    Posts
    20
    Thanks all for the input
    The problem in our infrastructure is that it is impossible to shutdown our pause the applications
    I was thinking of taking a snapshot and make a back-up of that snapshot.
    Because the files could change during the back-up
    But we don't have LVM installed.
    We got our data on a VNX and want to back-up to disks.
    what do you suggest to do this all?

    Thanks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •