Find the answer to your Linux question:
Page 1 of 2 1 2 LastLast
Results 1 to 10 of 16
I run my virtual machines from a LV and besides a regular file backup, I want to create regular images of the while LV. The problem with this is that ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined! jippie's Avatar
    Join Date
    May 2006
    Location
    Eindhoven, the Netherlands
    Posts
    76

    backup logical volume


    I run my virtual machines from a LV and besides a regular file backup, I want to create regular images of the while LV. The problem with this is that if I create several images of the same LV, maybe 80 or 90% of the contents is identical to the previous one and therefore consuming far more space on my backup than strictly necessary.

    Does anyone know about a tool that can do something smart with such large files, only storing identical blocks once?

    The idea is pretty much like deduplication like in lessfs, but with the deduplicating filesystems that I am aware of, I need to specify a fixed size container and these software packages are not very mature yet.

  2. #2
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,655
    Hmmm. It's doable, but I'm not familiar with software that can do a block/sector-level delta backup (only saving the changed bits). However, VirtualBox does this with their snapshots, where only the blocks of the virtual file system are saved in the current-image file. There is an opensource version of VirtualBox available on the Oracle web site that you could check out.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  3. #3
    Just Joined! jippie's Avatar
    Join Date
    May 2006
    Location
    Eindhoven, the Netherlands
    Posts
    76
    Thanks for your reply.

    I am actually running VirtualBox, but I don't like the file based(!) snapshots when I am running my VM from a logical volume. They make my VM run slow and they are typically on the wrong drive (not the backup, but on 'production' drives). Also fully restoring a VM from only its snapshots is according to various articles extremely difficult.
    I want to be able to do a full disaster recovery, not just a point recovery for which purpose these snapshots were designed.

  4. $spacer_open
    $spacer_close
  5. #4
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,655
    Understood, and I agree with the point-in-time recovery, although I disagree with the difficulty in rolling back snapshots, and performance is dependent largely upon your I/O bandwidth and utilization. In my case it works well and I've had no problems with it. However, all that aside, my point was that the source code could be either used as a model, or adapted, to a block-delta backup system. Just a thought. If you are a software engineer it may be illuminating even to just review the code.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  6. #5
    Just Joined! jippie's Avatar
    Join Date
    May 2006
    Location
    Eindhoven, the Netherlands
    Posts
    76
    I'm not much of a programmer, though I do know my way around bash scripting and Google&Paste-style PERL

    I've created a proof of concept in bash, although it deserves the title "concept" more than it deserves "proof" as I haven't tested recovery.

  7. #6
    Just Joined! jippie's Avatar
    Join Date
    May 2006
    Location
    Eindhoven, the Netherlands
    Posts
    76
    Did a quick code clean up and moved it to PERL. Problem with this little script is that it has little or no concept sanity check. Also never tried a full restore and a proper command line interface and input validation is simply non existent. I have no clue if the md5-sha512 combination is strong enough to determine identical blocks. Maybe a more detailed collision check should be included. Proper error checking is another missing feature. Using /dev/shm (or proper PERL libraries for system commands) for intermediate results may speed things up

    I love long descriptive variable names, the kind that other people drive insane

    Code:
    #!/usr/bin/perl
    
    use warnings;
    use strict;
    
    my $lv = '/dev/vg_diablo/vm_localserver';
    
    my ( $logical_volume_name , $volume_group_name , $logical_volume_access , $logical_volume_status , $internal_logical_volume_number , $open_count_of_logical_volume , $logical_volume_size_in_sectors , $current_logical_extents_associated_to_logical_volume , $allocated_logical_extents_of_logical_volume , $allocation_policy_of_logical_volume , $read_ahead_sectors_of_logical_volume , $major_device_number_of_logical_volume , $minor_device_number_of_logical_volume ) = split( /:/ , `lvdisplay -c $lv` );
    
    print "current_logical_extents_associated_to_logical_volume $current_logical_extents_associated_to_logical_volume\n";
    
    my ( $volume_group_name_too , $volume_group_access , $volume_group_status , $internal_volume_group_number , $maximum_number_of_logical_volumes , $current_number_of_logical_volumes , $open_count_of_all_logical_volumes_in_this_volume_group , $maximum_logical_volume_size , $maximum_number_of_physical_volumes , $current_number_of_physical_volumes , $actual_number_of_physical_volumes , $size_of_volume_group_in_kilobytes , $physical_extent_size , $total_number_of_physical_extents_for_this_volume_group , $allocated_number_of_physical_extents_for_this_volume_group , $free_number_of_physical_extents_for_this_volume_group , $uuid_of_volume_group ) = split( /:/ , `vgdisplay -c $volume_group_name` );
    
    print "physical extent size                                 $physical_extent_size\n";
    
    open REBUILDINFO , "> rebuild.info" or die "Cannot create file: $!\n";
    
    for ( my $count = 0; $count < $current_logical_extents_associated_to_logical_volume; $count++) {
            my $part = sprintf "%08i" , $count;
            system( "dd if=$lv of=part.$part bs=${physical_extent_size}k skip=$count count=1\n" );
            my $md5sum = `md5sum part.$part -b`;
            chomp $md5sum;
            $md5sum =~ s/^([0-9a-f]{32}).*$/$1/;
            my $sha512sum = `sha512sum part.$part -b`;
            chomp $sha512sum;
            $sha512sum =~ s/^([0-9a-f]{128}).*$/$1/;
            my $filename = "$md5sum-$sha512sum";
            print REBUILDINFO "part.$part $filename\n";
            if ( -e "${filename}.bz2" ) {
                    print "Duplicate!\n";
                    unlink( "part.$part" );
            } else {
                    rename( "part.$part" , $filename );
                    system( "bzip2 --best $filename" );
            }
    }
    
    close REBUILDINFO;
    
    exit;

  8. #7
    Just Joined! jippie's Avatar
    Join Date
    May 2006
    Location
    Eindhoven, the Netherlands
    Posts
    76
    Other thoughts and results of this script are on PoC Deduplication of Logical Volume Image - Wirespeed

    Seen the number of issues and improvements I can easily come up with, still searching for an off the shelf solution

  9. #8
    Linux Guru
    Join Date
    Nov 2007
    Posts
    1,759
    You may want to look at Opendedup.

    In your scenario, I think you'd snapshot your LV, mount the snapshot, and then just "cp" your snapped data into the Opendedup volume. Keep in mind the recommendation about deduping using 4K block sizes for virtual machine data.

  10. #9
    Just Joined! jippie's Avatar
    Join Date
    May 2006
    Location
    Eindhoven, the Netherlands
    Posts
    76
    I checked Opendedup, but I consider it not mature enough because I ran into several problems (lack of documentation, writing lots of data on a location where the FS is not mounted).
    A major disadvantage of Opendedup and lessfs is that you need to create a fixed size container beforehand. Growing/shrinking that container is not possibe.

  11. #10
    Linux Guru
    Join Date
    Nov 2007
    Posts
    1,759
    So you're looking for a cutting-edge and yet mature, filesystem?

    And you're thinking about putting together a homemade Perl script, but don't want to use something that's been in development for a year now? While I would not rely on a new script/method/filesystem alone, a periodic copy can go to a "safe" location (tape, disk, etc.) while more frequent "deduped" copies can go to a (possibly) more volatile location that's much more space-efficient.

    I have tested the dedupe in ZFS and found it does not achieve high dedupe rates for the typical data I was archiving. ZFS compression did a much better job than dedupe.

    Several commercial backup products offer block-level dedupe. Storage appliances such as Data Domain and Netapp do as well.

Page 1 of 2 1 2 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •