State of the nation: referral spam, comments, content management, dedicated hosting and more

Well it has been a few days since I have written in this blog, Christmas has been in the way and some other issues I will come to in a minute. I did actually write this entry last night and then lost it..:-(. I have a USB 60 gig hard disk that I keep my back catalogue of useful info on. I had it plugged in to get at something, wrote most of this entry last evening and then went to have my evening meal. I did not have the laptop power supply plugged in and the USB disk flattened the battery. So when I came back to my laptop this post (the previous incarnation of it) had gone. This is the problem of writing posts in a text box on a HTML page. More of this later.

I was going to write an entry here two nights ago but had to spend most of Tuesday evening dealing with the ever burgeoning issue of referral spammers. I have written about referral and comment spammers here before in a couple of posts "Comments, spam and statistics spiders" and "Some Perl and problems with referral spammers" so won't go to town completely on this. Basically I have had two main phases of referral spammers recently. I won't go into comment spammers again as comments are off and I obviously donít get that issue now. A couple of months ago I had trouble with some spammers using fixed IP addresses for ISP's in the states and sending 130,000 hits in a few days. That equated to a very low number of visits and the purpose was for him to send referral URL's that included links to the filth and drugs he was promoting. I managed in the end to stop him by getting the ISP to take notice but only when i enlisted the help of some other site owners being plagued by the same guy. Now more recently I have been plagued by spammer(s) using more convincing URL's in the referrer field but they are redirected to the same drug and filth sites. This time they are using a large amount of different IP Addresses and linking to large amounts of different sites. Itís becoming a problem.

Why do they do it? Two reasons, well, possibly three. either they want site owners to click on their links and visit and buy their rubbish. Or they are looking for back links and PR in Google - this is the most likely. Finally it does seem like some of these people are actually running a business by generating these referral requests for clients. Why do they do it? Because they can!. They look for sites that have statistics because statistics pages can include referrers and these count as links for these sites and it increases their google standing. The same applies to blogs that publish top referrer lists. Comments provide the same opportunities for these people. Why does it work? well check out this google search : "usage_200512.html", there are 59,700 results and just a quick look at a few of the links on the first page of results shows why they are successful. Lots of hits to stats pages and lots of links to filth.

I have tried to manage the problem for a long time now by sending them 403's but it makes no difference. The killer for me is that is that I don't publish statistic referrer logs, comments or even blog top referrers. I am included in this nonsense because these people just look for sites that use webalizer (in my case) and I assume other stats programs. They probably look for blogs as well for the same reason and then use scatter gun approach on everyone even if they don't publish any links. We all lose!

I finally decided to remove the webalizer stats pages the night before last. It is the only way in the end I guess. This now means that these people get a 404 but that doesnít stop them, they are relentless. I guess it might stop come month end! lets see.

Whilst we are on the same subject it would be nice to deal with these at the firewall, most of the IP addresses they use are blacklisted. But my ISP doesn't allow me access to the firewall or to even use mod_rewrite which might present some good options!

I have been casually looking at dedicated or possibly virtual hosting as an option in the future. It costs a lot more but it would let me have access to the firewall and also to install what i need. It would also let me to think about using Oracle as a back end. i have looked at movable type and wordpress in the past as options to replace greymatter as I would like some of the features such as draft posts (that would have saved last nights version of this post), comment moderation. I could not run wordpress as it needs mod_rewrite to allow me to keep the same URL's. when Oracle XE came out I was thinking to create my own Content Management System (CMS) using HTMLDB or even the PL/SQL toolkit. I would need to wait until XE is non beta and also find time to write it (This is the killer) and also if it went on a dedicated host i have the problems of finding time to manage the hosting as well... maybe writing a CMS is not the option, maybe Wordpress or Movable Type are.... who knows at this stage, its all just thoughts. I was thinking that to create a CMS backed by Oracle but keep static pages, i.e. the CMS just manages the data and generates fixed HTML might not be that difficult. I could implement comments properly, draft posts, RSS and many more features but then again it would be better ti just use existing apps and use my time on Oracle Security subjects. A dedicated hosting deal might still be beneficial to be able to have more space and access to firewalls.. itís expensive thoughÖ.. Maybe someone will give me free dedicated hosting for Christmas.

I still need to fix up the site to work on all browsers, i have had a post in my forum "Menu links with different browsers" as my menus do not work correctly in Firefox. I would like to get some time to re-work the page layouts into CSS layers and not tables and fix the menus......

