Results 1 to 8 of 8
Enjoy an ad free experience by logging in. Not a member yet? Register.
- 10-24-2012 #1
EXT4 Data Corruption Bug Hits Stable Linux Kernels
read this before panicking though.
It looks like fscking everything will fix it (it'll replay the buggered
journal, mangling the metadata, but then fix up the scrambled metadata
and fix the journal's starting block). So I probably don't need to worry
about latent corruption hiding waiting to pounce. Phew.
So don'tSo it will only show up if the system has been rebooted twice in fairly quick succession. A full conventional distro install probably wouldn't have triggered a bug... although someone who habitually reboots their laptop instead of using suspend/resume or hibernate, or someone who is trying to bisect the kernel looking for some other bug could easily trip over this --- which I guess is how you got hit by it."
The bugfix should come quick. My AntiX 64bit test install (3.6 kernel) is ext4 file system with /home sitting in /.
I am not worried though.
- 10-25-2012 #2
I have data partitions with several distros mounting them read/write at the moment, don't suspend to disc ever, and would struggle to archive all data on a regular basis ... the root partition I don't care about ... data partitions I do care about !
I'll have to have a proper think about this one ... thanks for the heads-up roky
Ed: I've decided to fsck the data partition every time I start the system
tune2fs -c 1 /dev/sda_data_partition_number
Last edited by Jonathan183; 10-25-2012 at 12:09 PM.
- 10-26-2012 #3Update 25-10-12:
Theodore Ts'o has continued his investigation of the bug and has found that the problem was more esoteric than was first thought. The user who reported the problem was using umount -l, which immediately unmounts the filesystem without waiting for it to stop being busy. The bug is now thought to be caused when the machine is being shut down while it is in the process of unmounting the filesystem with an already compromised journal.
The developers are still working to pinpoint the exact problem and it might actually involve more kernel components than just the ext4 drivers. In any case, it has become clear that the bug needs a very specific configuration to surface and is unlikely to affect most users.
Just more info for you..
- 10-26-2012 #4
- Join Date
- May 2004
- arch linux
It's still good to know about potential problems like this just in case things should go weird one day, so thanks for the alert, roky!oz
- 10-26-2012 #5
Luckily I have a backup every hour... on to a disk using ext4What do we want?
When do we want 'em?
Doesn't really matter does it!?
The Fifth Continent
- 10-26-2012 #6
My favourite take on this is, the kernel developers don't hide nothing from me. They admit a mistake the day it happens. They jump right on it and try to troubleshoot and rectify the problem. All of this I can see in real time. First time I ever monitored a bug that might affect me. It blows me away on the speed and honesty this bug is being approached with.
Not like some other operating systems we are familiar with.
- 10-27-2012 #7
Over 50 rounds in this fight and still counting. I lost track of counting after 50. Almost ready for the knockout.
Date: 2012-10-27 03:11:47 GMT (1 hour and 51 minutes ago)
Theodore Ts'o wrote:
The problem is this code isn't done yet, and journal_checksum is
really not ready for prime time. When it is ready, my plan is to wire
it up so it is enabled by default; at the moment, it was intended for
developer experimentation only. As I said, it's my fault for not
clearly labelling it "Not for you!", or putting it under an #ifdef to
prevent unwary civilians from coming across the feature and saying,
"oooh, shiny!" and turning it on.
Perhaps a word or two in the mount man page would be appropriate?
- 11-02-2012 #8
- Join Date
- May 2004
- arch linux