Wednesday, December 5, 2007

Sun Fire 4600 M2 and paired DIMM mismatch

I was working on configuration of Sun Fire 4600 M2 server and I was surprised when the message "Paired DIMM Mismatch" appeared during server's initial boot. I understood the warning but I wasn't sure with the cause of it. I received the server with 4 CPU boards each with 4 GB memory installed. Beside this, I received additional 24 2GB memory modules (or 12 4GB memory kits) and another 4 CPU boards. I populated the server with new CPU boards after installing additional memory modules onto them (together every board had 8GB memory installed) and powered on the server.

System boards installed in modern servers are often equipped with a BMC or "Board Management Controller". This circuit is responsible of managing the interface between system management software and hardware platform. BMC relies on many hardware sensors reporting such parameters as temperature, fan speeds, power mode and so on. The BMC is the intelligence in the IPMI or "Intelligent Platform Management Interface" architecture. In my opinion BMC is not the same as SC or "System Controller" in Sun terminology which is ILOM but it is separated controller.

Above is a screenshot of BMC response after BIOS POST startup. BMC warned me that some CPU boards' memory module pairs weren't combined well. This event can lead to degraded performance due to no optimal memory module pair interleaving.

According to the "Sun Fire X4600/X4600 M2 Servers Diagnostics Guide" documentation there exists a couple of identified errors related to memory modules:
  • NODE-n Paired DIMMs Mismatch
    • modules in pair aren't the same or the checksum are different
  • NODE-n Memory Configuration Mismatch
    • modules are not in pair (they are running in 64-bit mode instead of 128-bit)
    • modules don't support ECC
    • modules' speed is different
    • modules are not registered
    • modules' type/generation/organization or CL/T is mismatched
    • the banks on a two-sided module are mismatched
    • and the others ...
  • NODE-n DIMMs Manufacturer Mismatch
    • module's manufacturer is not supported
After removing affected CPU boards from the server and inspecting the memory module pairs the situation was clear to me. I forgot to check the memory modules vendors because I supposed they are all the same. So I reorganized them properly, put the boards back and then everything was working smoothly.

1 comment:

Anonymous said...

Thanks a lot for this post!

-- lazywebber