Hmm... this is quite frustrating.
I’ve tried this:
$e = &$modx->Event;
$e = &$modx->Event;
// Only on this event
if ($e->name == 'OnDocFormSave') {
set_include_path('/Documents and Settings/Edward/My Documents/My Webs/htmlpurifier/library'
. PATH_SEPARATOR . get_include_path());
include_once('HTMLPurifier.php');
$purifier = new HTMLPurifier();
$_POST['tvcontent'] = $purifier->purify($_POST['tvcontent']);
}
return $purifier;
But it doesn’t seem to work. A good test of HTMLPurifier is assigning a lang attribute to one of the elements. It should be copied over to xml:lang as per XHTML compatibility guidelines. I’m not getting this behavior, so I have to assume that the plugin isn’t working.
Plus, the thing itself is extremely hacky: what if input comes in from another vector? To be quite honest, I don’t know what I should be doing. If someone else wants to take a stab at it, be my guest, but I am stumped.
A few details about my library for anyone who wants to step up to the plate: It’s extremely easy to use, add the directory containing the library files to your path, include HTMLPurifier.php, instantiate an HTMLPurifier object, and then call purify() on whatever you need. The above code shows the theoretical flow pattern.
HTMLPurifier will remove anything that’s not in its list of allowed elements, but the notable ones are OBJECT, EMBED, IFRAME and FORM (I like to call these defective by design). It will remove anything not in allowed attributes, which means that any scripting added on later on will be removed if you process to late.
It’s not meant to process complete documents (because, of course, other parts of the document may need scripting and forms and etc). So it shouldn’t be run on complete pages.
Finally, the library currently only supports UTF-8. I am working to also allow other major charsets (notably iso-8859-1), but if you don’t switch to UTF-8, expect some weird char encoding issues.