OpenMosix cluster crashes
Here's the setup. The main computer has Gentoo, and kernel 2.4.22-openmosix. I have a windows box that I only use for games and a couple other applications and would like to cluster it with the Linux box while not in use. On that, I am using PlumpOS also with kernel version 2.4.22.
The PlumpOS system works fine and has no problems with stability. The main gentoo system has a lot of problems, though. I have compiled the kernel with GCC 2.95 as was suggested. Everything is rock solid until I start openmosix. Soon after it joins the cluster, the entire system crashes, and I get a kernel panic with something like "openmosix divide error".
Additionally, running mosmon shows all nodes at first, but then the other dissapears after a couple updates. I think that the problem is when it tries to migrate a process. The last time it crashed was the instant I tried giving the migrate command. MFS works perfectly, and so do at least most of the other parts. I think this is only an issue with the acutal migration of processes. It also says the other nodes are down sometimes.
When the crash happens, sometimes the caps and scroll-lock lights come on, and sometimes they don't.
Does anyone know what the problem could be, and also why it says the other node is down? This is my first hpc aside from a distcc beowulf cluster, and I'm a newbie to openmosix. I read the howto, but it didn't seem to cover this kind of problem. There's probably some simple configuration problem that I'm missing, but I don't know what to check yet.
On the other hand, I DID manage to get clusterknoppix to work perfectly with these machines, so I know it can be done.
And sorry for the bad writing. I'm really tired right now and my writing ability seems to drop to a 4th grade level when I'm this tired.