Results 1 to 10 of 12
I have an LVM volume that is made up of two 500GB HDDs because I like the ability to add to a volume as it expands. Unfortunately, since there is ...
- 09-18-2009 #1Just Joined!
- Join Date
- Jan 2009
- Posts
- 15
Draft data loss mitigation method for spanned LVM (would like suggestions)
I have an LVM volume that is made up of two 500GB HDDs because I like the ability to add to a volume as it expands. Unfortunately, since there is zero redundancy if a drive fails catastrophic data loss will occur. This stems from the fact that the LVM and LUKS metadata are stored on the original drive in the volume group, and that FS Journals such as ext4 move around as space is used. After having some questions answered on reddit.com/r/linux and some googling, I've drafted a dataloss resistant mitigation method for LVM/ext4 (and LUKS).
You'll need at least 4 seperate storage block devices. I'm using a 256MB SD (lvmsd), a 4GB SD (jrnlsd), and two 500GB HDDs (sda & sdb).
First create your Physical Volumes:
Then create your volume group:Code:#pvcreate /dev/lvmsd1 #pvcreate /dev/sda1 #pvcreate /dev/sda2
(Optional) Create a LUKS volume on top of LVMCode:#vgcreate array /dev/lvmsd1 -s 4M #vgextend array /dev/sda1 #vgextend array /dev/sdb1 #lvcreate array /dev/lvmsd1 -l 100%FREE -n storage #lvextend /dev/array/storage /dev/sda1 -l 100%FREE #lvextend /dev/array/storage /dev/sdb1 -l 100%FREE
Code:#vgchange -ay array #cryptsetup --verify-passphrase --key-size 256 luksFormat /dev/array/storage #cryptsetup luksOpen /dev/array/storage luks-storage \\If you do create a LUKS volume, use /dev/mapper/luks-storage instead of /dev/array/storage for the remaining steps
Next, create an external journal, and filesystem
Finally, close everything and backup your lvmsdCode:#vgchange -ay array #mkfs.ext4 -O journal_dev /dev/jrnlsd1 #ls -l /dev/disk/by-uuid/ \\Find the symbolic link that points to your external journal device #mkfs.ext4 /dev/array/storage -L storage -J device=UUID=h3xuu1dt-0y0u-rd3v1c3
_____________Code:(Optional)#cryptsetup luksClose luks-storage #vgchange -an array dd if=/dev/lvmsd of=/root/lvmsd.bak bs=512 \\You should do this every time you change the volume group
I don't make any guarantees to the reliability to this method, and as I said, it's only a draft method and still doesn't replace regular back ups (you do that right?)
The advantage of this method, is that if one of the secondary physical drives fail, the journal is external, so you can mount the volume read-only and backup what you need (Possibly still use the volume until the failed drive is recovered, but I wouldn't recommend it). It also protects against the metadata being lost or corrupted, because you should have a regular backup of the primary physical drive (and maybe even the journal drive) and can quickly make a clone and drop in a replacement.
____________
I still have some questions that would make this method more effective:
What is a decent size for a journal drive? What would be a safe minimum for say a 4TB ext4 volume?
What would be the best way to set ext4 to not store anything in the first 256MB? I know the reserved file space can be changed from 5%, but is that 5% of the originally created file system, or does it increase if the filesystem is extended?
If the first 256MB can be blocked off, what negative effects would be a problem during normal use if the "lock" switch on the SD was turned on? I can tell that the lock would have to be disabled to add LVM PVs, but other than that, what would need to write to it?
- 09-18-2009 #2Linux Guru
- Join Date
- Nov 2007
- Posts
- 1,695
A simple method is to use the md driver to create a mirror (only requiring 2 drives), and then feed that device (such as /dev/md0) to LVM.
Using LVM, a volume group/logical volume is created and then mounted. Whatever you do to that mounted LVM volume is then mirrored at a level below LVM. The loss of one drive will not affect the data.
- 09-18-2009 #3Just Joined!
- Join Date
- Jan 2009
- Posts
- 15
Thanks for the reply, I've looked into that, but the usb drives I use are $100 each. For an increase of 1TB I'd have to spend $400 instead of half that. I know that this very dangerous, but I formed this method to lessen the impending disaster.
In my oppinion, this is a cost effective and scalable media server storage for a budget.
But under a more flexable budget, I see that what you said could be another layer of protection under my method, so I'll be sure to include it in my final paper to my instructor.
- 09-18-2009 #4Linux Guru
- Join Date
- Nov 2007
- Posts
- 1,695
A) How much is hours of your time worth - all while pulling your hair out? Been there/done that. A few more $$$ spent for a known/tested data protection method far outweighs some "it might work" process that's tied to a particular filesystem.
B) As a mental exercise/project, good luck with it.
Have you tested a failure with this method?
Were you able to recover all data if the first drive failed?
- 09-18-2009 #5Just Joined!
- Join Date
- Jan 2009
- Posts
- 15
In my small scale test, I had a 256MB SD and an 8GB Flash drive then another 256MB SD plus the external journal. If I wrote a small text file, that would go into the first SD card, then I dd 8GB of urandom to a file, which wrote throught the rest of the first SD, through the Flash drive, and then a little bit on the third SD. Then I wrote a text file, that went onto the third SD. I can confirm this behavior because the status lights flash depending on which device is being written to.
Then I unmounted everything, unplugged the Flash drive, and remounted. I could read the both the first text file and the 2nd text file. I know this would become more difficult if data starts getting sprayed around, but if you loose a middle drive, files that were not written on it are still recoverable.
And if the first drive fails, since it's just a 256MB SD, I have a 1:1 DD of it so I can just write that to a new SD and continue from there
- 09-18-2009 #6Linux Guru
- Join Date
- Nov 2007
- Posts
- 1,695
The filesystem "journal" does not hold all of the data in the filesystem. If you have (2) 500GB HDD's and a ??? size device for the journal, and one of the 500GB HDD fails, are you suggesting that 500GB of data can be recovered simply because you have the FS journal intact?I have an LVM volume that is made up of two 500GB HDDs because I like the ability to add to a volume as it expands.
- 09-18-2009 #7Just Joined!
- Join Date
- Jan 2009
- Posts
- 15
Not at all, the journal is only 128MB for my current system, so there's no problem with even a 512MB SD holding it. What I'm saying is that files that are on the good 500GB drive are still accessable, and the rest are lost until physical data recovery can be done.
Basically, a standard spanned LVM deployment is very dangerous, because if any drive, especially the originating LVM physical volume is lost, 100% of the data is lost. But after following my method, if a drive is lost, only 50% is lost. That number goes down as the Volume Group expands to 33%, 25%, 20%, 16% etc... It's still a catistrophic loss, which can be avoided by setting up each 500GB addition in an MD, but it's also an effective way to mitigate loosing all data.
- 09-18-2009 #8Linux Guru
- Join Date
- Nov 2007
- Posts
- 1,695
Ahh, now I understand what you are attempting...
But, how can you guarantee that if a 200GB file is written, parts of it are not on both drives? The file could have been written months/years ago and no part of it remains in the "journal." In that case, you will not be able to recover that file if even a few MB happens to be on the drive that failed...? And the result would be "some data can be restored and some won't...."
Following the reasoning above, you can see why "first drive failure" is not addressed by LVM - because there is no "guaranteed" solution. A spanned LVM volume is really no better than a RAID0 stripe - one failed drive could mean some or all of your data is lost. If that is a concern, then this LVM config should not be used.
* Your "percentages" of 50%, etc. are what are misleading. This *may* or *may not* be true. As a BEST case, only 50% is lost. But it could be worse, including nearly all data is lost. The 50% is not/cannot be guaranteed. If "preventative" measures are being taken, why take these steps (which make recovery a crapshoot) vs. an actual, reliable solution?Last edited by HROAdmin26; 09-18-2009 at 07:51 PM.
- 09-18-2009 #9Just Joined!
- Join Date
- Jan 2009
- Posts
- 15
Well there's the rub. I took into consideration that a very large file would be written on different drives, because in my small scale, all data was being written linearly so there wasn't a need to fit new data into deleted space. But that's the cost/benefit analysis I took. I'm not going to have any 200GB files, maybe at most 30GB, and being able to recover at least some files, is the trade off I took to make this as inexpensive as possible.
The reliability becomes less of an issue because data is read more than written, and when SSD's become affordable at 512GB, failure rates will become even less of an issue, because MLC/SLC wear really is only a concern for write/rewrite cycles.
Maybe I don't fully understand Raid 0 stripe, but I thought ALL data was split into 128K (or something) chunks to be written to both drives in the array. As far as I can tell, Spanned LVM doesn't write in chunks, but linearly. Data is only split across drives if the file system puts it there.
___
For your edit, I'm not proposing this as a reliable method by any means, unless a Raid 1 MD was underneath. I'm aware that 50% is only an "at best" estimate, but if this array is used to store say personal DVD rips and home videos, there won't be very much deleting, if any. This ends up with a reduction in the amount of data that is written across disks, to about one or two files per span.
I think that this method is effective as an alternative to a proven system such as Raid 1 when The files stored on it have a backup (eg, original files on a laptop or they are rips of BD movies). Meaning this is just a way to save the user a day or two of reripping his/her collection of BDs if only one section is lost.
- 09-18-2009 #10Linux Guru
- Join Date
- Nov 2007
- Posts
- 1,695
Again, you have many "caveats" about when your steps will work and how much data can be restored - these are all specifics based on YOUR configuration and usage. Someone else could follow your steps and get almost no benefit.
While the stripe size can vary (doesn't have to be 128K), this is correct. And much smaller files can be spread across the drives (based on the filesystem's needs.) Again, your steps can't guarantee any recovery - only make it "maybe" possible. If this works for you and you know/understand all of the caveats, great - carry on.Maybe I don't fully understand Raid 0 stripe, but I thought ALL data was split into 128K (or something) chunks to be written to both drives in the array. As far as I can tell, Spanned LVM doesn't write in chunks, but linearly. Data is only split across drives if the file system puts it there.
No - these steps might and might not "mitigate" data loss.Draft data loss mitigation method for spanned LVM


Reply With Quote
