HTML Purifier is incredibly awesome. It's a set of scripts you can run user submitted HTML content through before displaying on the web (or, better, before saving to the database so you only have to run it once on submitted content) and it will remove any non-whitelisted HTML tags, as well as foiling various possibly nefarious acts (e.g., script attributes on tags, etc...) It also does a great job of cleaning up unclosed tags so that individual posts don't screw up formatting of an entire page. Seems like a mandatory thing to have if you are accepting HTML content from users (and clearly if you accept from anonymous users.) It's just incredibly robust.
One issue I ran into though (it's pretty complex) is that URLs in links were being replaced by '%5C'. The problem was that I was running the HTML through mysql_real_escape_string before HTMLPurifier. You need to do HTMLPurifier first, and then mysql_real_escape_string. I have no idea why, but that is the case. Maybe this will help someone else.
|
One issue I ran into though (it's pretty complex) is that URLs in links were being replaced by '%5C'. The problem was that I was running the HTML through mysql_real_escape_string before HTMLPurifier. You need to do HTMLPurifier first, and then mysql_real_escape_string. I have no idea why, but that is the case. Maybe this will help someone else.
- jim 1-17-2011 12:30 pm