formSpamBotBlocker doc file
Class version: 0.3
Date: 7 Apr 2007
A. Introduction
Most of the webmasters know the problem of automatic form submissions by spambots too well! There are some solutions out there, the use of CAPTCHA is one of the best known. Unfortunately solutions such CAPTCHA require active human interaction, something that decreases accessibility.
This php class follows another way, that does not require any extra human interaction at all. It based on human behaviour patterns rather than on human intelligence. It creates <input type="hidden"> tags with encrypted values or visually hidden tags (CSS) to identify a spambot. The combination of multiple methods can really confuse the spambots, even if only html code is created. Please note, that no Capthca-, Cookie- or Javascript-based methods are used here by default. Just plain (x)html and optionally a little CSS code or Session variables. Most of the human users will not even realize, that the form is someway spam protected. That's the point...
B. The basic ideas:
1. A user (human or robot) must have the same IP and the same http user agent ID on both pages, that send (html form) and receive (action of html form - target page: same or other page) POST or GET requests. Humans always do, robots sometimes do not, as they often only call the target page with the required parameters. In other words: a page containing a html form must be loaded before its target page (page that accepts the parameters) is loaded and the IP and browser of the user must be the same on both pages.
-> A spambot is forced to use the same IP and agent ID when scanning and attacking
2. A human user will not be affected by hidden tags with daily changing names, depending on the current date, as they simply do not see them. As a matter of fact, humans could be affected, if e.g. they call a web form at 23.57 and send the request at 0.06 (next day), but there is a simple solution for that too (see below). On the other hand robots use to prescan a html page containing a form and then call the target page with the scanned parameters. A daily changing hidden input name requires prescanning at the current day.
-> A spambot is forced to prescan the form at the current day when attacking
3. A form should be submitted within a specific time window. If this time window is too short or too long, then the user is more likely to be a robot than a human. For example a human cannot submit a form, that has 6 required text inputs in just 2 seconds...
-> A spambot is forced to submit the form within a specific time window when scanning and attacking
4. A spambot will try to populate every form element with some value so as to best ensure that it will succeed in being posted. If a standard text input tag is used in the form, that is hidden visually from the user, a human will not enter anything into this field. it is quite likely though, that a spambot will still post some value for this form element.
-> A spambot is forced to identify visually hidden trap form elements and ignore them when attacking
5. There is no need for a user to call the target page of a web form using the same parameters more than once, without filling out the form again. Humans may do that by clicking on the reload button of the browser, but they should not. Spambots do that when trying to find a way to submit the form.
-> A spambot is forced to pass the protection of the form at once when attacking
C. Implementation of the above ideas
1. [userID] A <input type="hidden"> tag, generated dynamically, has name and value attributes depending on the current user IP and browser ID. The name and value are encrypted and their length can be easily changed. This input tag will be checked on the target page for validity. Of cource, these are static values for an unique IP and browser -> Robots would still have to use the same IP and browser ID, when scanning and calling a form, to simulate a human user. Many spambots cannot do that though...
2. [dynID] Partial static values are not enough, we need some dynamically changing name/value to prevent automatic form submissions. Another <input type="hidden"> tag must be generated dynamically. This tag has an encrypted name attribute, depending on the current date. This daily changing name prevents prescanning older than the current day. To avoid the midnight problem, mentioned above, a class variable $minutesAfterMidnight has been added. It sets the minutes after midnight to still allow the submission of a form generated at the previous day.
3. To set a time window for a form to be submitted, the class uses 2 variables $minTime and $maxTime. When a form is generated the current time will be encoded and set as value of the previous dynID input tag. On the target page, this time value will be compared using the variables $minTime and $maxTime.
4. [trapID] A standard <input type="text"> tag with a tempting name will be generated. This element will be visually hidden from a human user with CSS. However, if CSS is disabled, the input will still be displayed. For this reason, an explanatory label is provided that informs the user to not enter anything into the trap tag. A spambot, that has scanned the web form, will probably submit the form with some value in the trap tag. The class checks is there is a value and identifies the spambot (or a human user, who has disabled CSS and has ignored the label instructions). This method can be disabled by function setTrap().
5. A session based method is used to prevent submitting a form more than once, before loading the form again. This method is enabled by default but it can be disabled by setting the public variable $hasSession=false. A session variable contains the number of the form submissions made after loading the form (and calling makeTags()). If the number is larger than 1, the function checkTags() returns false.
The generated <input type="hidden"> tags contain encoded names and values. To make it even more difficult for a spambot to guess the encoded names and values of the <input type="hidden"> tags, the class uses an encryption method based on a unique key passed though the form. These names and values (and the key) change dynamically each time a form is loaded. You can even make the names and values almost impossible to guess (even if the spambot knows the source code of the class) by setting your own unique string as the value of the public variable $initKey.
D. Can spambots still successfully submit a web form protected by this class?
I can currently think of 2 methods, a spambot could use to submit a web form protected by this class:
1. If the spambot (it's developer) knows the source code of the class and a unique $initKey value has not be set. A script could dynamically generate all the required encoded parameters and pass them to the target url. However, if the $initKey value of the class (not shown in plain html), which is used to encode/decode the parameters, has been set, an external script would not be of any use...
2. If a spambot is able to simulate human behaviour really good. To achieve that though, it should load a web form, scan its elements and call the target url with the required parameters using the same IP/agent ID (all within a limited time window on the same day). Moreover, it should be able to identify the trap <input type="text"> tag (by analysing the CSS?) and let its value="". If the session based method to prevent submitting a form more than once is enabled, the spambot will not be able to repeatidly call the target page, with some previous scanned valid parameters, until it finds a valid time window. It will have to do that at once.
E. How to use the class
1. Create the required <input> tags on the page contaning the web form
a. Optionally set your defaults in the class source file (public variables), set your own unique $initKey!
b. Include the class in your script
c. Create an object: $blocker=new formSpamBotBlocker();
d. Optionally call public functions or set public variables to adapt your defaults to the current web form
e. within your html form: print $blocker->makeTags();
e. get the xhtml string: $hiddentags=$blocker->makeTags(); (if $hasSession=true, make sure you call makeTags() before the output of any html code, or you will get an error message!)
f. within your web form: print $hiddentags;
2. Check if the $_POST or $_GET array contains the valid parametes on the target page
if ($_POST){ // or $_GET
$blocker=new formSpamBotBlocker();
$nospam=false;
$nospam=$blocker->checkTags($_POST); // or $_GET
if ($nospam) print "Valid Submission"; // handle valid request
else print "Invalid Submission"; // handle invalid request
}
F. changelog
v0.3 - 3 May 2007:
New methods added:
- By setting the public class variable $hasSession=true 2 session variables will be generated in order to prevent submitting a form more than once, before loading the form again.
- A new public method getTagNames() has been added. This methods returns an array with the names of the generated form elements.
v0.2 - 5 Apr 2007:
Initial release
|