If your web application accepts content or input from your users, it’s nice to be able to display it in a useful format back to them. For example, some web sites auto-link or convert text presented in a URL format as a hyperlink to improve the user experience. The user may type in the following URL into a form.

http://particletree.com

On display, our auto-linking script would then convert that to:

<a href="http://particletree.com">http://particletree.com</a>

It’s also nice to provide more web savvy users the ability to use certain HTML tags (like a, strong, em) in an unescaped format when it’s displayed back to the user. We have taken the approach provided by Chirs Shiflett to allow HTML and prevent XSS. And so when the user enters the following into a field.

<a href="http://particletree.com">Particletree</a>

It will be escaped to prevent any XSS attacks.

&lt;a href=&quot;http://particletree.com&qout;&gt;Particletree&lt;/a&gt;

And then run through an HTML sanitize script to allow certain safe tags to be displayed properly.

<a href="http://particletree.com">Particletree</a>

When used in combination (and in a way to prevent security breaches), auto-linking and allowing approved HTML tags can lead to some unexpected formatting. The problem with using the two techniques together is that the auto-linking script has to be smart enough to not link anything inside of an a tag. For example, this would cause the following input

<a href="http://particletree.com">Particletree</a>

which has a url inside of a link to convert to the undesirable :

<a href="<a href="http://particletree.com">http://particletree.com</a>">Particletree</a>

None of the PHP auto-linking scripts that we found accounted for this and so we had to add the following regex look behind as a solution.

$text = preg_replace("'(?<!=\")(http|ftp)://([\w\+\-\@\=\?\.\%\/\:\&\;~\|]+)(\.)?'", "<a title=\"Go to \\1://\\2\" href=\"\\1://\\2\">\\1://\\2</a>", $text);

Hope that helps others looking for a similar solution.

HTML Form Builder
Ryan Campbell

Smarter Auto-Linking by Ryan Campbell

This entry was posted 4 months ago and was filed under Notebooks.
You can follow comments on this entry by subscribing to the RSS feed. Comments are currently closed.

· 11 Comments! ·

  1. Joao Prado Maia · 4 months ago

    Very useful, thanks.

  2. Markus · 4 months ago

    It is only a matter of execution sequence. If the auto-linking happen before the HTML cleanup, there would be no conflicts.

  3. Ryan Campbell · 4 months ago

    Markus, we do a good amount with Smarty, so we made the autolinking available as a modifier. The unescaping, on the other hand, is a good amount of code and is needed everywhere, so we do that on the PHP side. It is a specific oddity we ran into, but one that others may run into, so it still may help in some circumstances.

  4. Russell · 4 months ago

    Another good thing is to not auto-link things inside

    <pre> </pre>

    tags. This is a very annoying ‘feature’ in all of 37signals’ products.

  5. Niyaz PK · 4 months ago

    Thanks for that.

  6. hypotheek · 4 months ago

    what a nice solution! going to test it out! thanks alot!

  7. sean · 4 months ago

    Hey, I want to learn about apis and how to build one and can find no good book on it, do you know any good book that i could buy my email address is in this comment.

  8. gossard · 4 months ago

    As a corollary to the point about Fitts’s Law not addressing movement in multiple dimensions or amidst distractions, consider that the notion of a target is relative. Sure you may want the user to click a particular button, but if the layout provides all germane interaction handles in visual clusters, that cluster can become the initial target. During the movement toward the larger target, the user may subdivide and segregate specific target from distraction. By the time they’ve discerned their specific target, the distance that they’ll need to cross is lessened, and they’re already in motion.

    No big science here. Just sharing a thought :)

  9. maomao · 4 months ago

    Markus, we do a good amount with Smarty, so we made the autolinking available as a modifier. The unescaping, on the other hand, is a good amount of code and is needed everywhere, so we do that on the PHP side. It is a specific oddity we ran into, but one that others may run into, so it still may help in some circumstances.

  10. stone KID · 4 months ago

    I think Alan Cooper pointed out that Fitt’s Law implies that besides the corners, the easiest target to acquire is “the current location of the pointer.” IOW, the biggest button is one you don’t have to move to at all. AFAIK this is not often used.

  11. Rexibit Web Services · 3 months ago

    I really like this. I am thinking of doing something similar with a CMS I am working on.