Monday, February 25, 2008

Quickly - PCI Express

Do you know how PCI Express work? Do you know how PCI Express slots look like or how do you figure how fast is a slot? I was looking for some picture or scheme of PCI Express slots and related speeds. Nothing more, just to be aware of it. I have found the following picture. More about it is written at computer.howstuffworks.com. Another comprehensive source of information is wikipedia.org.

Monitoring ASSP with monit

Do you know ASSP or Anti-Spam SMTP Proxy? I'm going to write some details about it in the near future. If you have deployed it on your servers to eliminate spams already, I will show you how to monitor it with the monit and restart it in case of a failure. The configuration was tested on Linux.

At first, I had to edit the init script of the service to be able to check its pid. The init script after the changes is below and the added lines are bolded:

#!/bin/sh -e
PATH=/bin:/usr/bin:/sbin:/usr/sbin

case "$1" in

start)
echo "Starting the Anti-Spam SMTP Proxy"
cd /usr/share/assp
perl assp.pl
ps ax | grep "perl assp.pl" | grep -v grep | awk '{ print $1 }' > /var/run/assp.pid
;;

stop)
echo "Stopping the Anti-Spam SMTP Proxy"
kill -9 `ps ax | grep "perl assp.pl" | grep -v grep | awk '{ print $1 }'`
rm -f /var/run/assp.pid
;;

restart)
$0 stop || true
$0 start
;;

*)
echo "Usage: /etc/init.d/assp {start|stop|restart}"
exit 1
;;

esac
exit 0

I know, there are many other ways how to do it better how to be compliant with the distro but I just want to show you how to configure the monit service. The monit service depends on it and it is used to define the service check block.

The assp service is listening at the TCP port 55555 by default which provides a simple configuration interface over HTTP protocol. The interface is authenticated so if you try to access it without proper authentication it will return a status code 401. It means client's authentication failure. You can get the whole error message via telneting to the port:

telnet localhost 55555
GET / HTTP/1.0



I used the HTTP protocol in version 1.0 and sent a GET request. If you want to use the version 1.1 you need to send the Host header as well. After pressing Enter and sending an empty line the request is processed and the following message is replied:

HTTP/1.1 401 Unauthorized
WWW-Authenticate: Basic realm="Anti-Spam SMTP Proxy (ASSP) Configuration"
Content-type: text/html

Server: ASSP/1.2.6()

Date: Mon, 25 Feb 2008 13:45:53 GMT

Content-Length: 49


...


We are going to be interested in the first line which contains already mentioned error code. The snippet of monit configuration code which monitors our service and the related process via pid file looks like:

check process assp with pidfile /var/run/assp.pid
start program = "/etc/init.d/assp start"
stop program = "/etc/init.d/assp stop"

It checks a pid of the process and if it is not running the service will be restarted. Now, we will extend it with the ability to check the connectivity to the port 55555:

check process assp with pidfile /var/run/assp.pid
start program = "/etc/init.d/assp start"
stop program = "/etc/init.d/assp stop"
if failed host 127.0.0.1 port 55555
then restart

But we would like to talk to the port with HTTP protocol. The above line is simple connectivity check over TCP protocol. Better is to do it via HTTP. The monit service support it and you can do it like:

check process assp with pidfile /var/run/assp.pid
start program = "/etc/init.d/assp start"
stop program = "/etc/init.d/assp stop"
if failed host 127.0.0.1 port 55555 protocol http
then restart

The above line is not the right one for us because it is suitable for unauthenticated environments. By default, it checks the return code only and it will be successful if it receive OK status or return code 200. To catch the return code 401 we need to redefine what we are going to expect. If you use a send/expect mechanism you need to omit "protocol http":

check process assp with pidfile /var/run/assp.pid
start program = "/etc/init.d/assp start"
stop program = "/etc/init.d/assp stop"
if failed host 127.0.0.1 port 55555
send "GET / HTTP/1.0\r\nHost: localhost\r\n\r\n"
expect "HTTP/[0-9\.]{3} 401 .*Unauthorized.*"
then restart


So, we construct the whole GET request and we expect the error code 401. If we receive anything else the monit service evaluates it as a connectivity failure and restarts the assp service. To be more fault tolerant it's better to check it twice, three times or more times to be really sure the service is not listening at the port 55555:

check process assp with pidfile /var/run/assp.pid
start program = "/etc/init.d/assp start"
stop program = "/etc/init.d/assp stop"
if failed host 127.0.0.1 port 55555
send "GET / HTTP/1.0\r\nHost: localhost\r\n\r\n"
expect "HTTP/[0-9\.]{3} 401 .*Unauthorized.*"
for 3 cycles then restart

That's everything. Why are we doing it like this? If the assp service is running the configuration interface should be accessible via TCP port 55555. Otherwise something is wrong and we should restart the service for sure.

Tuesday, February 12, 2008

SLES10 SP1 and pam session group errors

I was configuring a new backup server of our customer and I wanted to integrate it to the running LDAP infustructure. So I intented to configure it as a LDAP client and join it to the customer's LDAP server.

The backup server is based on SLES10 SP1 distribution and for such basic configuration tasks is equipped with YaST configurator. I can only recommend it if you don't want to waste time with simplicities! To configure a server as a LDAP client is really straightforward and you don't need to edit any config files like /etc/ldap.conf, /etc/nssswitch.conf or PAM config files and to remember exactly what to write where. Just fill in the proper options like address of LDAP server, LDAP base DN, if to use SSL/TLS, LDAP protocol version and confirm it. The screenshot ilustrates these options.


But sometimes troubles happen. After finishing the above process, remotely of course, everything seems to work as I expected. I was able to see LDAP users what was the main goal. Perfect!

The first problem which I noticed was that I wasn't able to connect to the server remotely via ssh once more. Debbuging of the connection didn't helped me. Why?!? Good question!

I had to inspect the server locally and I found the following errors in messages:
  • sshd[8351]: Accepted publickey for root from A.B.C.D port 59203 ssh2
  • sshd[8353]: pam_warn(sshd:session): function=[pam_sm_open_session] service=[sshd] terminal=[/dev/pts/0] user=[xxx] ruser=[] rhost=[yyy]
  • error: PAM: pam_open_session(): Cannot make/remove an entry for the specified session
The errors are related to the sshd service and they were the result of unsuccesful connection. Another malfunctioned service was crond service and its errors were identical.

It is clear that something is wrong with PAM configuration of the services. In SLES10 and others distros the PAM modules are used for authentication, account and session processing of the most services. This behaviour of sshd daemon is affected with one option in the /etc/ssh/sshd_config config file:
  • UsePAM yes
So I decided to try turning it off to make me sure I'm going in the right direction. The sshd service started working again but I wasn't still sure what's wrong with it. What should I do with the crond service to bypass the PAM modules?

I realized that the only thing I had changed before was the LDAP client configuration. I tried to bring the system to the previous state without it but it didn't helped. That means that when I had configured the LDAP client with YaST some operations weren't successful. Unfortunately, as I mentioned the LDAP client configuration is straightforward and you need to change only a few config files. Of course, you need to have installed the required packages with binaries and libraries.

I took a look over the configuration files and they seemed to be perfect. Only the /etc/pam.d/common-session file didn't contain any lines. This file is common for all other PAM config files and it is inluded from them. So, how to check its contents? Remember to use the rpm command in such situations. To check the validity of the pam package I run:
  • rpm -V pam
It showed me that the file had to be changed:
  • S.5....T c /etc/pam.d/common-session
The config file was different from the original one. The difference were these two missing lines:
  • session required pam_limits.so
  • session required pam_unix2.so
Finally, I tried to replace the modified file with the one from the installation source, put the support of PAM modules back and checked the services. They started to work again.

What's the result? Don't forget to use the strong tools like rpm and remember that simple things can go wrong too.