EDAC Overview
EDAC (Error Detection and Correction) is a set of Linux kernel modules for handling hardware-related errors.
Its major focus has been ECC memory error handling, however it also detects and reports PCI bus parity errors.
EDAC通过sysfs方式获得硬件的信息。检测到问题时,在syslog中可以看到这样的日志:
kernel: EDAC MC1: CE row 0, channel 0, label "": Corrected error (Socket=1 channel=0 dimm=0)
有一个叫做edac-util的工具,可以用来查看更详细的状态报表。
$ edac-util -r
mc1: csrow0: ch0: 43722040 Corrected Errors
上面的输出表示有较多的可修复错误(Corrected Errors – CE)被发现。
edac-util的安装,在RHEL或CentOS上,可以通过yum安装:
yum install edac-util -y
Links
[http://www.kernel.org/doc/Documentation/edac.txt EDAC – Error Detection And Correction]
[http://bluesmoke.sourceforge.net EDAC Project]