Tuesday, December 18, 2007

How to manage services on RHEL/SLES identically?

If you are lazy enough just use the service and chkconfig commands. Both distribution has its own mechanism how services or related init scripts are managed from the command line and GUI. The service command is common for both of them and provides a way how to pass right arguments to the proper init script and run an action on a related service, e.g. check the status of a service, restart a service and so on. If you want to configure a runlevel the service should run in use the chkconfig command. The command shares a few arguments between our platforms.

On SLES you can use a symbolic link with prefix "rc" to the current init script. From GUI you can use YaST configuration tool and do the work with mouse. Or you can run an init script directly. For configuring run levels you can choose the insserv command. Short examples will explain the usage of commands on cron (crond on RHEL) service:
  • to check the status of cron service run
    1. rccron status
    2. /etc/init.d/cron status
    3. service cron status
  • to start the cron service (the same holds for stop or restart action) run
    1. /etc/init.d/cron start
    2. rccron start
    3. service cron start
  • to enable the cron service in runlevels defined in the header of its init script run
    1. insserv cron
    2. chkconfig --add cron
  • to enable the cron service in runlevels 2, 3 and 5 run
    1. chkconfig --level 235 cron on
RHEL provides ntsysv text based configurator. Its functionality is as the same as of the chkconfig command but with higher comfort. And from the GUI use system-config-services command. Now, follow the same examples:
  • to check the status of crond service
    1. /etc/init.d/crond status
    2. service crond status
  • to start the crond service (the same holds for stop or restart action)
    1. /etc/init.d/crond start
    2. service crond start
  • to enable the crond service in runlevels defined in the header of its init script run
    1. ntsysv
    2. chkconfig --add crond
  • to enable the crond service in runlevels 2, 3 and 5 run
    1. chkconfig --level 235 crond on
I must point out the usage of GUI through the examples was omitted due to obvious reason - it's straightforward. It's interesting to compare both systems and to find out their services are managable via common commands. I'm sure everybody knows how chkconfig is working but do you know about service command? It was hidden for me until now. I will use it.

Monday, December 17, 2007

Quickly - configuring VLANs on RHEL

  • configuration steps on RHEL (4, 5) - they have to be done as root:
    1. we have already configured eth0 interface which is accessible from the network and we want to add VLAN support to it, e.g. to accept packets tagged with VLAN ID 123 (avoid using VLAN ID 1, it is often used as administration VLAN)
    2. we need to have support at the kernel level, try to load the proper module
      • modprobe 8021q
    3. configuration of interface eth0 is stored in the following file and it may contain its MAC address, IP address, netmask, network and so on
      • /etc/sysconfig/network-scripts/ifcfg-eth0
    4. to accept packets with VLAN ID 123 on that interface, run the following commands:
      • cd /etc/sysconfig/network-scripts/
      • cp ifcfg-eth0 ifcfg-eth0.123
    5. the new configuration file defines virtual interface eth0.123 of the main interface eth0, which will accept untagged packets from VLAN 123
    6. to enable VLANs on the interface eth0, append this line to the newly created configuration file
      • VLAN=yes
    7. and change the line defining interface name from eth0 to eth0.123
      • DEVICE=eth0.123
    8. correct another settings - IP address, netmask and related
    9. finally, apply the new network settings with
      • /etc/init.d/network restart
    10. check the status via proc interface
      • /proc/net/vlan/eth0.123

Wednesday, December 12, 2007

Sun Fire V490 and RSC not responding

I was preparing Sun Fire V490 server for our customer and I wanted to configure RSC controller at first to be able to continue my work remotely. So I installed RSC packages and run the following command:
  • /usr/platform/`uname -i`/rsc/rsc-config
This command, or better shell script which is calling rscadm command from the same directory, is responsible for initial setting of RSC controller properties like network interface and serial port settings, notifications, users and so on. After a few questions are answered the script will flash the RSC PROM to activate the provided values.

My trouble appeared in the phase of PROM flashing. It finished unconditionally with some errors mentioning a problem with network interface settings. Then I tried to run the command again after checking the entered settings and ended with this error message:
  • rscadm: RSC firmware not responding
I tried to wait a few minutes at first because I supposed the controller is restarting. Then I tried to restart it with rscadm:
  • rscadm resetrsc
But with the same result like above. So I rebooted the server to OK prompt and tried to restart the RSC controller from it with the command:
  • reset-rsc
I booted the server to single user mode and tried the rscadm command again. Again without any response. Finally, I decided to flash the PROM by hand via:
  • rscadm download ../lib/images/rscfw
I run that command from the directory:
  • /usr/platform/`uname -i`/rsc/
Finally, the flashing began and finished after a few minutes with success and I was able to proceed with RSC reconfiguration. Successfully as well.

I' m surprised how rscadm behaves. Any option ended with no response but download option used for RSC PROM flashing was still working and solved the whole problem. More about RSC (version 2.2) is published at docs.sun.com. How to download and install RSC is placed at www.sun.com.

Wednesday, December 5, 2007

Sun Fire 4600 M2 and paired DIMM mismatch

I was working on configuration of Sun Fire 4600 M2 server and I was surprised when the message "Paired DIMM Mismatch" appeared during server's initial boot. I understood the warning but I wasn't sure with the cause of it. I received the server with 4 CPU boards each with 4 GB memory installed. Beside this, I received additional 24 2GB memory modules (or 12 4GB memory kits) and another 4 CPU boards. I populated the server with new CPU boards after installing additional memory modules onto them (together every board had 8GB memory installed) and powered on the server.

System boards installed in modern servers are often equipped with a BMC or "Board Management Controller". This circuit is responsible of managing the interface between system management software and hardware platform. BMC relies on many hardware sensors reporting such parameters as temperature, fan speeds, power mode and so on. The BMC is the intelligence in the IPMI or "Intelligent Platform Management Interface" architecture. In my opinion BMC is not the same as SC or "System Controller" in Sun terminology which is ILOM but it is separated controller.


Above is a screenshot of BMC response after BIOS POST startup. BMC warned me that some CPU boards' memory module pairs weren't combined well. This event can lead to degraded performance due to no optimal memory module pair interleaving.

According to the "Sun Fire X4600/X4600 M2 Servers Diagnostics Guide" documentation there exists a couple of identified errors related to memory modules:
  • NODE-n Paired DIMMs Mismatch
    • modules in pair aren't the same or the checksum are different
  • NODE-n Memory Configuration Mismatch
    • modules are not in pair (they are running in 64-bit mode instead of 128-bit)
    • modules don't support ECC
    • modules' speed is different
    • modules are not registered
    • modules' type/generation/organization or CL/T is mismatched
    • the banks on a two-sided module are mismatched
    • and the others ...
  • NODE-n DIMMs Manufacturer Mismatch
    • module's manufacturer is not supported
After removing affected CPU boards from the server and inspecting the memory module pairs the situation was clear to me. I forgot to check the memory modules vendors because I supposed they are all the same. So I reorganized them properly, put the boards back and then everything was working smoothly.