Results 1 to 10 of 16
I run my virtual machines from a LV and besides a regular file backup, I want to create regular images of the while LV. The problem with this is that ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
- 03-01-2011 #1
backup logical volume
I run my virtual machines from a LV and besides a regular file backup, I want to create regular images of the while LV. The problem with this is that if I create several images of the same LV, maybe 80 or 90% of the contents is identical to the previous one and therefore consuming far more space on my backup than strictly necessary.
Does anyone know about a tool that can do something smart with such large files, only storing identical blocks once?
The idea is pretty much like deduplication like in lessfs, but with the deduplicating filesystems that I am aware of, I need to specify a fixed size container and these software packages are not very mature yet.
- 03-01-2011 #2Linux Guru
- Join Date
- Apr 2009
- Location
- I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
- Posts
- 10,156
Hmmm. It's doable, but I'm not familiar with software that can do a block/sector-level delta backup (only saving the changed bits). However, VirtualBox does this with their snapshots, where only the blocks of the virtual file system are saved in the current-image file. There is an opensource version of VirtualBox available on the Oracle web site that you could check out.
Sometimes, real fast is almost as good as real time.
Just remember, Semper Gumbi - always be flexible!
- 03-02-2011 #3
Thanks for your reply.
I am actually running VirtualBox, but I don't like the file based(!) snapshots when I am running my VM from a logical volume. They make my VM run slow and they are typically on the wrong drive (not the backup, but on 'production' drives). Also fully restoring a VM from only its snapshots is according to various articles extremely difficult.
I want to be able to do a full disaster recovery, not just a point recovery for which purpose these snapshots were designed.
- 03-02-2011 #4Linux Guru
- Join Date
- Apr 2009
- Location
- I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
- Posts
- 10,156
Understood, and I agree with the point-in-time recovery, although I disagree with the difficulty in rolling back snapshots, and performance is dependent largely upon your I/O bandwidth and utilization. In my case it works well and I've had no problems with it. However, all that aside, my point was that the source code could be either used as a model, or adapted, to a block-delta backup system. Just a thought. If you are a software engineer it may be illuminating even to just review the code.
Sometimes, real fast is almost as good as real time.
Just remember, Semper Gumbi - always be flexible!
- 03-02-2011 #5
I'm not much of a programmer, though I do know my way around bash scripting and Google&Paste-style PERL
I've created a proof of concept in bash, although it deserves the title "concept" more than it deserves "proof" as I haven't tested recovery.
- 03-02-2011 #6
Did a quick code clean up and moved it to PERL. Problem with this little script is that it has little or no concept sanity check. Also never tried a full restore and a proper command line interface and input validation is simply non existent. I have no clue if the md5-sha512 combination is strong enough to determine identical blocks. Maybe a more detailed collision check should be included. Proper error checking is another missing feature. Using /dev/shm (or proper PERL libraries for system commands) for intermediate results may speed things up
I love long descriptive variable names, the kind that other people drive insane
Code:#!/usr/bin/perl use warnings; use strict; my $lv = '/dev/vg_diablo/vm_localserver'; my ( $logical_volume_name , $volume_group_name , $logical_volume_access , $logical_volume_status , $internal_logical_volume_number , $open_count_of_logical_volume , $logical_volume_size_in_sectors , $current_logical_extents_associated_to_logical_volume , $allocated_logical_extents_of_logical_volume , $allocation_policy_of_logical_volume , $read_ahead_sectors_of_logical_volume , $major_device_number_of_logical_volume , $minor_device_number_of_logical_volume ) = split( /:/ , `lvdisplay -c $lv` ); print "current_logical_extents_associated_to_logical_volume $current_logical_extents_associated_to_logical_volume\n"; my ( $volume_group_name_too , $volume_group_access , $volume_group_status , $internal_volume_group_number , $maximum_number_of_logical_volumes , $current_number_of_logical_volumes , $open_count_of_all_logical_volumes_in_this_volume_group , $maximum_logical_volume_size , $maximum_number_of_physical_volumes , $current_number_of_physical_volumes , $actual_number_of_physical_volumes , $size_of_volume_group_in_kilobytes , $physical_extent_size , $total_number_of_physical_extents_for_this_volume_group , $allocated_number_of_physical_extents_for_this_volume_group , $free_number_of_physical_extents_for_this_volume_group , $uuid_of_volume_group ) = split( /:/ , `vgdisplay -c $volume_group_name` ); print "physical extent size $physical_extent_size\n"; open REBUILDINFO , "> rebuild.info" or die "Cannot create file: $!\n"; for ( my $count = 0; $count < $current_logical_extents_associated_to_logical_volume; $count++) { my $part = sprintf "%08i" , $count; system( "dd if=$lv of=part.$part bs=${physical_extent_size}k skip=$count count=1\n" ); my $md5sum = `md5sum part.$part -b`; chomp $md5sum; $md5sum =~ s/^([0-9a-f]{32}).*$/$1/; my $sha512sum = `sha512sum part.$part -b`; chomp $sha512sum; $sha512sum =~ s/^([0-9a-f]{128}).*$/$1/; my $filename = "$md5sum-$sha512sum"; print REBUILDINFO "part.$part $filename\n"; if ( -e "${filename}.bz2" ) { print "Duplicate!\n"; unlink( "part.$part" ); } else { rename( "part.$part" , $filename ); system( "bzip2 --best $filename" ); } } close REBUILDINFO; exit;
- 03-02-2011 #7
Other thoughts and results of this script are on PoC Deduplication of Logical Volume Image - Wirespeed
Seen the number of issues and improvements I can easily come up with, still searching for an off the shelf solution
- 03-02-2011 #8Linux Guru
- Join Date
- Nov 2007
- Posts
- 1,722
You may want to look at Opendedup.
In your scenario, I think you'd snapshot your LV, mount the snapshot, and then just "cp" your snapped data into the Opendedup volume. Keep in mind the recommendation about deduping using 4K block sizes for virtual machine data.
- 03-03-2011 #9
I checked Opendedup, but I consider it not mature enough because I ran into several problems (lack of documentation, writing lots of data on a location where the FS is not mounted).
A major disadvantage of Opendedup and lessfs is that you need to create a fixed size container beforehand. Growing/shrinking that container is not possibe.
- 03-03-2011 #10Linux Guru
- Join Date
- Nov 2007
- Posts
- 1,722
So you're looking for a cutting-edge and yet mature, filesystem?

And you're thinking about putting together a homemade Perl script, but don't want to use something that's been in development for a year now? While I would not rely on a new script/method/filesystem alone, a periodic copy can go to a "safe" location (tape, disk, etc.) while more frequent "deduped" copies can go to a (possibly) more volatile location that's much more space-efficient.
I have tested the dedupe in ZFS and found it does not achieve high dedupe rates for the typical data I was archiving. ZFS compression did a much better job than dedupe.
Several commercial backup products offer block-level dedupe. Storage appliances such as Data Domain and Netapp do as well.


Reply With Quote
