
Using HTMLCaptcha |
HTMLCaptcha is a component, meaning that it can be programmed in many different ways. It was designed to be flexible, and adapt to new attacks. What follows is an example of one way in which to use HTMLCaptcha. Click here to download the example (C#).
The following is a strong CAPTCHA validation with the following features.
Each of these will be addressed in more detail following the demonstration of the user interface.
The user is presented with four randomly output HEC (HTML Encoded) images. The images are randomly altered (imperceptible to the human eye) so that an image signature cannot be easily gained. To the right are a list of 11 descriptors.

The user must select the correct descriptors from the above options. He simply needs to click the correct descriptor and it will be added automatically to the input field below. Using Javascript makes the user experience fluid and simple.

Once the user has selected the four descriptors, he clicks the Validate button. If he has clicked all four correct descriptors (order doesn't matter), the form validates him:

Otherwise, he fails.

OCR scanning techniques do not apply.
Because we output actual images, OCR text scanning techniques that have developed
recently and continue to be developed cannot be used. Further, the images
are encoded to HTML using the included "image cache creation" tool. There are
no actual images (i.e. GIF, JPG, PNG) involved in the CAPTCHA. This makes it more difficult
for a bot to discern the text from the image. Developers may add and change
images at their discretion using the image cache creation tool.
Resistant to random brute force attacks.
There are two ways that HTMLCaptcha may be attacked (since OCR techniques
do not apply). (1) An attacker
has access to the image cache used by a site, e.g. in the case where a site uses
the default image cache included with HTMLCaptcha. (2) A bot scans the HTML and
parses out the descriptors, selecting them at random and sending them on to the
validation page.
1. There are approximately 100 images in the default image cache that comes installed with HTMLCaptcha. An attacker could conceivably extract the descriptors from the cache and send these randomly to a validation page on a site that uses the default cache. However, by simply displaying 2 or more images and requiring the descriptors for them renders this attack essentially useless. Since there are about 100 images, 2 selections would create a probability of 1 chance in 1002, or 10,000. In our above example, we require 4 selections, which is on the order of 1 chance in 100,000,000.
2. This is the most likely attack to succeed against HTMLCaptcha, especially if the developer does no obfuscation when displaying the descriptors, such as using Javascript to output, outputting random hidden descriptors around the actual descriptors, etc. There is no real obfuscation in the example above. This is why we have required a user to select the descriptors of 4 images. Each set of 2 images has 11 descriptors to its right. After the first selection is made, there are 10 descriptors left to choose from. This yields a 1 in 11*10 chance, or 1 in 110 per set of images. Since there are 2 image sets, the probability of randomly guessing the 4 correct descriptors is 1 in (11*10)2, or 1 chance in 12,100.
(I find it interesting to note that CAPTCHA image deciphering techniques can yield a very high success rate even for difficult CAPTCHAs -- along the lines of 33% for the most difficult test. Compare this to using HTMLCaptcha with a single image and only 10 descriptors, which yields a probability of 1 in 10, or just a 10% success rate in a brute force attack.)
Javascript selection.
Each descriptor in the example is a hyperlink with a tiny bit of Javascript that
adds the descriptor to the input field below in a special format (a leading
'+'). This accomplishes two things. First, it makes discerning the descriptors that
much more difficult, since a bot will have to extract it from the Javascript. Second,
and more importantly, it makes our CAPTCHA easy for the user. Consider
typing a 5 character alphanumeric CAPTCHA. This requires 5 key presses by the user.
Now, consider that, by using Javascript, the user is only required to click
his mouse 4 times. This is a comparable requirement.
Random image alteration.
HTMLCaptcha has a property, modifyPixels, which, if set
to true, will alter the HEC image at random before output, varying the "pixel" color
a slight amount off the original color. This is generally not noticable by a human,
but a bot (or human) that is trying to capture the "signature" of the images of
a particular site is not guaranteed that the images will look the same way each
time. Add to that that a developer may continue to add new image caches at whim,
and this becomes a poor way to mount an attack.