Saturday, September 13, 2008

Using coldfusion to strip some or all HTML tags from a string

I wrote a custom tag called CF_TAGSTRIPPER that would strip HTML tags from a source string, optionally preserving specific tags.
A few years ago, I converted it to a UDF and I've used it in a variety of my applications to strip out all but a few tags - like if I wanted to allow users to use bold and italic tags.
Recently, I've had a desire to strip out only certain tags, so I modified my tagStripper function to perform this task as well. Here's the code. Please let me know if you see anything wrong with it!
There are two versions of this UDF - one for CFMX and one for CF 5. They are functionally the same, so use whichever you prefer.
CFMX Version using CFFUNCTION tag:

<cffunction name="tagStripper" access="public" output="no" returntype="string">
<cfargument name="source" required="YES" type="string">
<cfargument name="action" required="No" type="string" default="strip">
<cfargument name="tagList" required="no" type="string" default="">

<!---
source = string variable
This is the string to be modified

action = "preserve" or "strip"
This function will either strip all tags except
those specified in the tagList argument, or it will
preserve all tags except those in the taglist argument.
The default action is "strip"
tagList = string variable
This argument contains a comma separated list of tags to be excluded from
the action. If the action is "strip", then these tags won't be stripped.
If the action os "preserve", then these tags won't be preserved (ie, only
these tags will be stripped)

EXAMPLE

tagStripper(myString,"strip","b,i")

This invocation will strip all html tags except for
<b></b> and <i></i>
--->
<cfscript>
var str = arguments.source;
var i = 1;

if (trim(lcase(action)) eq "preserve")
{
// strip only the exclusions
for (i=1;i lte listlen(arguments.tagList); i = i + 1)
{
tag = listGetAt(tagList,i);
str = REReplaceNoCase(str,"</?#tag#.*?>","","ALL");
}
} else {
// if there are exclusions, mark them with NOSTRIP
if (tagList neq "")
{
for (i=1;i lte listlen(tagList); i = i + 1)
{
tag = listGetAt(tagList,i);
str = REReplaceNoCase(str,"<(/?#tag#.*?)>","___TEMP___NOSTRIP___\1___TEMP___ENDNOSTRIP___","ALL");
}
}
str = reReplaceNoCase(str,"</?[A-Z].*?>","","ALL");
// convert excluded tags back to normal
str = replace(str,"___TEMP___NOSTRIP___","<","ALL");
str = replace(str,"___TEMP___ENDNOSTRIP___",">","ALL");
}

return str;
</cfscript>
</cffunction>

No comments: