Hi,

I'm new to mcelog as of this morning, but I'd like to help out a friend on a project and am hoping someone can give me a push in the right direction.

I've tried Googling but I can't find anything that explains what the entries mean in the mcelog. If anyone could point me to a good resource, even just salient points in code, that would be great. An example entry from my test box is as follows:

Code:
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 2 4 northbridge TSC 109db45d6c65 
ADDR 2b1bc4190 
  Northbridge Chipkill ECC error
  Chipkill ECC syndrome = be21
       bit40 = error found by scrub
       bit46 = corrected ecc error
       bit62 = error overflow (multiple errors)
  bus error 'local node response, request didn't time out
      generic read mem transaction
      memory access, level generic'
STATUS d410c100be080a13 MCGSTATUS 0
I'm mostly interested in the first line (does MCE 0 indicate a status level or just the first MCE error found?), the fourth line (what does CPU 2 4 mean?), and how to distinguish between different types of MCE/CMCI stuff. If anyone even just knows how to annotate the above entry with a short description of meaning, that would be incredibly awesome.

Additionally, I am looking at the 1.0-pre2 tarball and am confused re: diskdb.c and memdb.c. My project is supposed to be combing these logs and making them into a simplistic database. My understanding is that attempts with the diskdb had problems; however I can't quite understand how to trace memdb -- what it is, where it might be created, how I might look at it, etc.

Any help at all - pointers to good websites, other forums, or email lists appreciated. Thanks so much