Monday, June 30, 2008

A CAPTCHA Server Control for ASP.NET





Introduction
I'm sure everyone reading this is familiar with spam. There are two schools of thought when it comes to fighting spam:
1.Bayesian filtering, such as POPFile.
2.Challenge/response human verification, such as SpamArrest.
Both of these approaches have their pros and cons, of course. This article will only deal with the second technique: verifying that the data you are receiving is coming from an actual human being and not a robot or script. A CAPTCHA is a way of testing input to ensure that you're dealing with a human. Now, there are a lot of ways to build a CAPTCHA, as documented in this MSDN article on the subject, but I will be focusing on a visual data entry CAPTCHA.
There's already a great ASP.NET article on a CAPTCHA control here on CodeProject, so you may be wondering what this article is for. I wanted to rebuild that solution for the following reasons:
more control settings and flexibility
conversion to my preferred VB.NET language
abstracted into a full blown ASP.NET server control.
So, this article will document how to turn a set of existing ASP.NET web pages into a simple, drag and drop ASP.NET server control -- with a number of significant enhancements along the way.
Implementation
The first thing I had to deal with was the image generated by the CAPTCHA class. This was originally done with a dedicated .aspx form-- something that won't exist for a server control. How could I generate an image on the fly? After some research, I was introduced to the world of HttpModules and HttpHandlers. They are extremely powerful -- and a single HttpHandler solves this problem neatly.
All we need is a small Web.config modification in the

<system.web>

section:


<httpHandlers>
<add verb="GET" path="CaptchaImage.aspx"
type="WebControlCaptcha.CaptchaImageHandler, WebControlCaptcha" />
</httpHandlers>


This handler defines a special page named CaptchaImage.aspx. Now, this "page" doesn't actually exist. When a request for CaptchaImage.aspx occurs, it will be intercepted and handled by a class that implements the IHttpHandler interface: CaptchaImageHandler. Here's the relevant code section:


Public Sub ProcessRequest(ByVal context As System.Web.HttpContext) _
Implements System.Web.IHttpHandler.ProcessRequest
Dim app As HttpApplication = context.ApplicationInstance

'-- get the unique GUID of the captcha;

' this must be passed in via querystring

Dim strGuid As String = Convert.ToString(app.Request.QueryString("guid"))

Dim ci As CaptchaImage
If strGuid = "" Then
'-- mostly for display purposes when in design mode

'-- builds a CAPTCHA image with all default settings

'-- (this won't reflect any design time changes)

ci = New CaptchaImage
Else
'-- get the CAPTCHA from the ASP.NET cache by GUID

ci = CType(app.Context.Cache(strGuid), CaptchaImage)
app.Context.Cache.Remove(strGuid)
End If

'-- write the image to the HTTP output stream as an array of bytes

ci.Image.Save(app.Context.Response.OutputStream, _
Drawing.Imaging.ImageFormat.Jpeg)

'-- let the browser know we are sending an image,

'-- and that things are 200 A-OK

app.Response.ContentType = "image/jpeg"
app.Response.StatusCode = 200
app.Response.End()

End Sub


A new CAPTCHA image will be generated, and the image streamed directly to the browser from memory. Problem solved!
However, there's another problem. There has to be communication between the HttpHandler responsible for displaying the image, and the web page hosting the control -- otherwise, how would the calling control know what the randomly generated CAPTCHA text was? If you view source on the rendered control, you'll see that a GUID is passed in through the querystring:


<img src="CaptchaImage.aspx?guid=99fecb18-ba00-4b60-9783-37225179a704"
border='0'>


This GUID (globally unique identifier) is a key used to access a CAPTCHA object that was originally stored in the ASP.NET Cache by the control. Take a look at the CaptchaControl.GenerateNewCaptcha method:


Private Sub GenerateNewCaptcha()
LocalGuid = Guid.NewGuid.ToString
If Not IsDesignMode Then
HttpContext.Current.Cache.Add(LocalGuid, _captcha, Nothing, _
DateTime.Now.AddSeconds(HttpContext.Current.Session.Timeout), _
TimeSpan.Zero, Caching.CacheItemPriority.NotRemovable, Nothing)
End If
Me.CaptchaText = _captcha.Text
Me.GeneratedAt = Now
End Sub


It may seem a little strange, but it works great! The sequence of ASP.NET events is as follows:
1.Page is rendered.
2.Page calls CaptchaControl1.OnPreRender . This generates a new GUID and a new CAPTCHA object reflecting the control properties. The resulting CAPTCHA object is stored in the Cache by GUID.
3.Page calls CaptchaControl1.Render; the special tag URL is written to the browser.
4.Browser attempts to retrieve the special tag URL.
5.CaptchaImageHandler.ProcessRequest fires. It retrieves the GUID from the querystring, the CAPTCHA object from the Cache, and renders the CAPTCHA image. It then removes the Cache object.
Note that there is a little cleanup involved at the end. If, for some reason, the control renders but the image URL is never retrieved, there would be an orphan CAPTCHA object in the Cache. This can happen, but should be rare in practice-- and our Cache entry only has a 20 minute lifetime anyway.
One mistake I made early on was storing the actual CAPTCHA text in the ViewState. The ViewState is not encrypted and can be easily decoded! I've switched to ControlState for the GUID, which is essential for retrieving the shared Captcha control from the Cache -- but by itself, it is useless.


CaptchaControl Properties:



Many of these properties have to do with the inherent tradeoff between human readability and machine readability. The harder a CAPTCHA is for OCR software to read, the harder it will be for us human beings, too! For illustration, compare these two CAPTCHA images:






The CAPTCHA on the left is generated with all "medium" settings, which are a reasonable tradeoff between human readability and OCR machine readability. The CAPTCHA on the right uses a lower CaptchaFontWarping, and a smaller CaptchaLength. If the risk of someone writing OCR scripts to defeat your CAPTCHA is low, I strongly urge you to use the easier-to-read CAPTCHA settings. Remember, just having a CAPTCHA at all raises the bar quite high.
The CaptchaTimeout property was added later to alleviate concerns about CAPTCHA farming. It is possible to "pay" humans to solve harvested CAPTCHAs by re-displaying a CAPTCHA and giving the user free MP3s or access to pornography if they solve it. However, this technique takes time, and it doesn't work if the CAPTCHA has a time-limited expiration.
Conclusion
Now u can create a simple yet effective CAPTCHA image class. Now that I've wrapped it up into an ASP.NET server control, it should be easier than ever to simply drop on a web form, set a few properties, and start defeating spammers at their own game!

No comments: