Create HTML Image Cache

HTMLCaptcha is unique in that it renders real images to pure HTML/CSS for insertion into a web page. This simplifies the Captcha coding process and speeds it up. No images have to be written. Instead, a randomly selected image from the cache is output.

Because HTMLCaptcha uses HTML/CSS, image data output is not very size efficient. HTMLCaptcha is suited specifically to small, icon-sized images. Anything larger will prove inhibitive, both in download times and browser rendering. It is recommended that you use icon-sized images no bigger than 32x32 pixels. You can certainly use larger images, but browser rendering and download time may suffer. Read Optimizing HTMLCaptcha for details about how to keep HTML image output as small as possible.

Creating an image cache

A Windows application is included in the installation that will render a directory of images to an HTML image cache. Go to Programs | HTMLCaptcha | Create HTML Images. A small form will load. Type in or browse to a folder that holds your icon images, and then set the folder where you want to output the image cache. If the output folder already contains image cache files, they will be overwritten. Be sure to use separate directories if you are creating more than one set of HTML image data. Click "Convert" and the image cache will be generated.

The image cache consists of two files:

    htcap.dat
    htcap.szb

htcap.dat contains the HTML image data, and htcap.szb is a seek index to locations within htcap.dat where the image data is stored. htcap.szb also contains the descriptors for your images -- their file names (see next section). When HTMLCaptcha is instantiated, you must pass it the path to the directory where these files are stored. Note that while you can only have one set of image cache files per directory, you can use multiple directories to keep different sets of image cache data.

Naming your images

When you create an image cache, the provided tool uses the image file names to create an index of the image's description. Because of this it is important that you name your files in such a way that users are not confused.

For instance, if you have two GIF calendar images that are slightly different, e.g.

You should either name them differently, e.g.

    red calendar.gif
    green-calendar.gif

or name them identically, with a number to differentiate them on the filesystem:

    calendar1.gif
    calendar2.gif

The image cache tool will automatically remove any number from the end of the filename, so that both descriptors will be stored as "calendar", as opposed to the potentially confusing "calendar1" and "calendar2". This is useful, say, if you have two calendars that look different but are both green -- you can name them "green-calendar1.gif" and "green-calendar2.gif", and when the descriptor selection is created, the user will be prompted simply with "green-calendar".

Spaces, hyphens, or dashes are acceptable in file names, and some amount of mixing of these separators is recommended, as it makes descriptor guessing harder for software, but not necessarily harder for a human. E.g. here we have green and red calendar images, with file names like that above, the descriptors presented to the user will be, respectively:

    "red calendar"
    "green-calendar"

Likewise, if you have a file, "blue_calendar.gif", this will be presented as "blue_calendar", which is still relatively easy for a human to understand. By introducing a random element to the naming of your files, you make it harder for a "bot" to determine your naming convention.

One thing to be careful about in using different separators in filenames, is naming similar images using different separators. Since HTMLCaptcha selects the correct descriptor and other, random descriptors from the remaining image cache index, you may end up presenting confusing options to the user. For instance, if you have two "plus sign" images of different sizes, e.g.

and you name them, respectively

    plus sign.png
    plus-sign.png

the user may be presented with one of the images, and both of the descriptors, making it impossible for him or her to guarantee the right choice: Let's say that the first image, "plus sign.png" is displayed from the image cache, along with its descriptor. HTMLCaptcha then randomly happens to select the descriptor for the second "plus sign" image, and displays it along with the correct one. The user would see the following descriptor choices:

    "plus sign"
    "plus-sign"

At this point there is no way for the user to know exactly which one is the right choice.

To avoid this, name the images using the same separator, and include a number to differentiate them on the filesystem. These image names then become

    plus sign1.png
    plus sign2.png

and even if the user is presented with both descriptors, e.g.

    "plus sign"
    "plus sign"

either one will yield the correct answer.