Wednesday, April 23, 2008

MySQL and PostgreSQL server upgrade pitfalls - I

We are hosting a few applications on our web hosting server which depend on PostgreSQL and MySQL databases. Unfortunately, some conditions caused that we need to upgrade the server from SLES9 installation to newer SLES10.

Let's compare the SLES10 distro and its predecessor according to databases they contain. The SLES10 contains MySQL server in version 5.0.x while the SLES9 is based on version.4.0.x. By the way, the MySQL 5.0.x was one of the requirements that lead us to the upgrade decision. Further, the SLES10 contains PostgreSQL server in version 8.1.x while the SLES9 contains version 7.4.x.

So, I needed to move data and related metadata from the old installation to the new one. Of course, I decided to test the scenario with data from backup. What problems did I have to solve? At first, the schema of administration database mysql which stores user privileges and so on was changed. There were added some new fields in version 5, e.g. in the table user were added fields like , Show_view_priv and so on, which aren't available in previous versions. So, simple backup command like the following one couldn't be working:

mysqldump --all-databases | gzip > /backup/mysql.gz

The reason why the script wasn't working is that it is using not complete INSERT statements like this:

INSERT INTO user VALUES ('localhost', 'test', 'test', ...);

Such a statement is trying to insert 31 values but the new schema requires the user table to have 37! So the statement will finish with the error:

#1136 - Column count doesn't match value count at row 1

To bypass such behaviour I had to fix the backup command to use complete INSERT:

mysqldump -c --all-databases | gzip > /backup/mysql.gz


After the change, there was added only -c parameter, the mysql dump containts complete INSERT statements:

INSERT INTO user (Host, User, ...) VALUES ('localhost', 'test', ...);

Such a statement is accepted by MySQL 5.0.x as well because there are explicitly defined which fields to insert. The remaining fields will stay empty or they will be assigned default values.

I think it's quite useful to remember on possible backward incompatibility between versions and to check if related tools, e.g. for database backup, provide a way how to deal with it.

The pretty much similar thing I had to solve with PostgreSQL. But it was worse because there was a problem with user data and the syntax of adequate COPY/INSERT statements. I will write about it next time.

Thursday, April 17, 2008

Fighting phishing plague with ClamAV

It's about one year and one month since long time awaited version of ClamAV 0.90 was released. The latest available version today is 0.92.1. The new version incorporates a lot of bug fixes, changes in configuration syntax, scripted updates and many other enhancements.

In my opinion, the most important one is an implementation of anti-phishing engine with help of Google Summer of Code 2006 program. It supports more generic methods how to identify phishing emails based on searching and comparing faked and real URLs in their bodies. The engine is based on heuristic analysis supported with special signatures. Through the next releases of versions 0.91 and 0.92 the engine was further improving. Finally, it was enabled by default in the release 0.91. If you are more interested in the releases you can check them in the clamav-announce mailing list.

So to protect your mail communication from phishing plague you only need to update to the latest version of ClamAV. But that's not everything. Sometimes, you can require to turn it off for purposes of e.g., testing false positives or to configure it more thoroughly. Everything like this is available in the configuration file /etc/clamd.conf in the form of these options (default values are specified here):
  • PhishingSignatures yes
    • try to detect phishing messages via signatures
  • PhishingScanURLs yes
    • scan URLs in the messages for heuristic analysis
  • PhishingRestrictedScan yes
    • anti-phishing engine works only with domains listed in the .pdb database otherwise scanning all of the domains may increase the false positive rate
  • PhishingAlwaysBlockSSLMismatch no
    • always block SSLmismatches, it is false positive raiser
  • PhishingAlwaysBlockCloak no
    • always block cloaked URLs, if it is enabled it seems to lead to the increase of false positives
More on these options you can find in the configuration file which is well annotated or in the related man page of ClamAV (e.g. here).

Was there any way how to deal with phishing messages before the release of ClamAV 0.90? Yes, it was, it is still here and it is good practice to combine it with ClamAV featuring anti-phishing. Before ClamAV 0.90 you could use third party signature files which contain definitions of phishing mails. The best known project doing this and more is the Sanesecurity project where you can download such signatures and feed your ClamAV with them. The ClamAV default place for signatures in the filesystem is the /var/lib/clam directory where you can download the files. After that, you need to restart clamd service (or similar).

Beside this, you can download from their web page scam signatures which can help you to get rid of spams based on MIME attachments like PDF documents, JPEG images and so on. I'm using it for a long period and it is really efficient. And you don't need to waste your time with SpamAssassin rules tuning. By the way, I don't know any which are so effective. To check the content of the signature files use the sigtool tool like this:
  • sigtool -l /var/lib/clamav/scam.ndb
  • sigtool -l /var/lib/clamav/phish.ndb
You can find there many interesting information. The part of signatures' names is well documented and you can break them down with help of this web page.

I don't have to forget to mention that the more comfortable approach is to install into your system the update script which can download the signatures for you automatically via cron service. The script is placed at the web page in the usage section (or directly here).

In the end, we have two weapons how to fight against phishing. Sanesecurity signatures seems to be more robust and mature while the ClamAV anti-phishing engine is too young to be as accurate as Sanesecurity. But it uses heuristics approach which means to be more flexible and dynamic and don't have drawbacks of static signatures like zero-day attacks. So the best practice is to join their power and use them together. If you go through the ClamAV tests at Sanesecurity web page, by the way they aren't up to date, you will find out the previous sentence about quality is true. The ClamAV isn't as perfect so far but in many cases it catches mails which are invisible to Sanesecurity signatures.

Wednesday, April 9, 2008

Monitoring mail queue size with monit

The monit project is really useful in many ways. But what I am missing now is a simple method how to test conditions on some output produced by a script or a command. For example, I would like to trigger an alarm when the mail queue size exceeds a defined limit. I'm not sure where but somewhere I found that the feature should be implemented in the future release of monit. I'm hoping.

For now, you can bypass the problem with a simple helper script. It needs to be scheduled via cron daemon in order to check the mail queue size regularly. If the size exceeds the specified limit it will write the size to a helper file whose checksum will be monitored. The size is written only once to avoid repeated alarms. So the script can look like this:

#!/bin/sh

MAX_SIZE=${1:-10}

CHECK_FILE="/var/run/mailq.size"
LOCK_FILE="${CHECK_FILE}.lock"

[ -f $CHECK_FILE ] || echo 0 > $CHECK_FILE

CUR_SIZE=`mailq | tail -n 1 | awk '{ print($5); }'`

if [ $CUR_SIZE -gt $MAX_SIZE ]; then
if [ ! -f $LOCK_FILE ]; then
echo $CUR_SIZE > $CHECK_FILE
echo > $LOCK_FILE
fi
else
rm -f $LOCK_FILE
fi

exit 0

Then you can place the script e.g. to the /etc/cron.hourly directory or you can specify your own schedule and define it with own crontab in the /etc/cron.d directory. Further, you can specify the limit as its first command line argument. Now, let's check the monit configuration:

check file mailq with path /var/run/mailq.size
if changed checksum
then alert

The above script is almost an one liner. It is simple and it is working. Unfortunately, I would like to use mailq command directly in the monit configuration.

Wednesday, April 2, 2008

Spamassassin, spamd and -L switch

I have been using Spamassassin in the company as our primary spam message blocker without troubles for years. The version used today is 3.2.3. It is boosted with a set of custom rules and with the rules from SARE project, with malware rules and others. It is also very useful to check for updated rules with sa-update command line utility (available in the newer versions of Spamassassin). Overall, the spams are identified quite precisely with very small false positives. The users have available simple web interface where they are able to check their caught messages.

But a week or two before, I had to begin solving a strange problem with some of them. A few users began complaining that they are receiving from 5 to 10 spams a day. That's not bad score because these users or their mail addresses are very popular among spammers and in total each of them would receive about from 2 to 5 thousands of unsolicited messages without spam filter.

So, where to begin? I took a suspicious message and tried to find out why it is evaluated as a regular message. The message is below (some unimportant parts were removed):

Return-Path: this1905@indygov.org
Received: from [187.83.34.110] (helo=rxexv)
by 168-215-192-214.dslindiana.com with smtp (Exim 4.62 (FreeBSD))
id 1KþxNR-0005Tq-Km; Wed, 2 Apr 2008 03:27:21 -0400
Message-ID: 47f334bf.3040802@indygov.org
Date: Wed, 2 Apr 2008 03:24:47 -0400
From: this1905@indygov.org
Subject: Gotcha! All Fool!

Wise Men Have Learned More from Fools... http://80.98.114.98


At first glance, it is clear to consider the message to be a spam. Included URL is placed at URIBL and SURBL blacklists. Further, the message was sent via blacklisted relay by SPAMCOP. So what's wrong?!? Let's check its score and applied rules simply with Spamassassin client:

spamc < message

I am using the spamc client because our spam filter is implemented with it. That means that in the background the spamd daemon should be running and the spam filter is communicating with it via spamc on demand. The previous command produced an output containing the following lines:

Content preview: Wise Men Have Learned More from Fools... http://80.98.114.98
[...]
Content analysis details: (4.3 points, 4.5 required)

pts rule name description
--------------------------------------------------
3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
[score: 1.0000]
0.6 FH_HELO_EQ_D_D_D_D Helo is d-d-d-d
0.0 NORMAL_HTTP_TO_IP URI: Uses a dotted-decimal IP address in URL
0.1 RDNS_NONE Delivered to trusted network by a host with no rDNS
0.1 AWL AWL: From: address is in the auto white-list


It seems like some rules are ignored. I'm sure that I have turned on DNS RBL checks and URIDNSBL plugin but the related rules are missing here. Debugging of spamd daemon didn't show anything. All rules were parsed and loaded successfully. To turn the debugging on restart the spamd daemon with -D option switch (or place it into the init script or into the sysconfig configuration file).

Another way how to check the rules is to bypass the spamd daemon and run the Spamassassin directly like this:

spamassassin -t < message

If you want to run it with debugging use the -D option switch. I was quite surprised when the direct checking shown me a different result:

Content preview: Wise Men Have Learned More from Fools... http://80.98.114.98
[...]
Content analysis details: (7.2 points, 4.5 required)

pts rule name description
--------------------------------------------------
3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
[score: 1.0000]
0.0 FH_HELO_EQ_D_D_D_D Helo is d-d-d-d
0.0 NORMAL_HTTP_TO_IP URI: Uses a dotted-decimal IP address in URL
2.0 URIBL_BLACK Contains an URL listed in the URIBL blacklist
[URIs: 80.98.114.98]
1.5 URIBL_JP_SURBL Contains an URL listed in the JP SURBL blocklist
[URIs: 80.98.114.98]
2.0 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
[Blocked - see ]
0.1 RDNS_NONE Delivered to trusted network by a host with no rDNS
-1.8 AWL AWL: From: address is in the auto white-list

The mentioned missing rules - URIBL_BLACK,URIBL_JP_SURBL ... - are suddenly here. Why? What is the difference between spamc client and spamassin command? It shouldn't be any but their behaviour can be controlled with many option switches (command line arguments). I ran the spamc without any but spamd daemon may run with some. To check them we should display the spamd process and its command line arguments with e.g. ps command:

ps -A -o cmd | grep spamd

The command will display something like this:

/usr/sbin/spamd -L -d -c -m 8 -u spamass -x -s local4 -r /var/run/spamd.pid

Let's go through the arguments and discuss their purpose:

  • -L - used to turn off any DNS and network tests
  • -d - demonize the process
  • -c - create user preferences files if they don't exist
  • -m - maximum number of children to spawn from the parent process
  • -u - run the process under the specified user
  • -x - disable user config files
  • -s - syslog facility for logging events
  • -r - write the process id to the file

There are many others options. If you want to know them check the spamd man page (e.g. here). In summary, only the first option -L is available for the spamassassin command. We can check it with:

spamassassin --help

Now it should be everything bright. Different scores were caused by the -L switch which changed the applied rules. Its result is that RBL checks are disabled and in conclusion the corresponding rules are not applied. Why I used the switch before I'm not pretty sure. Perhaps, I hit some performance constraints because it take some time to perform the network tests.

What's the result? Don't forget to check the command line arguments of the spamd daemon. They may disable some Spamassassin features despite of their definitions in configuration files.