Tuesday, October 21, 2008

HTML Robustness and Query String Parameter Naming

All the way back to 10 years ago, HTML was designed for non-professionals to publish information to the world wide web. The parsing and interpreting rules are loose. Web browsers implemented a lot of heuristics to display as much information as possible.

Now, we have been entered the age of Web 2.0 for a few years. The legacy loosen rules remains. One of the rules I have discovered today is the HTML entity. HTML entity without semicolon will be interpreted as if it has semicolon. For example, &lt will be interpreted as <.

The problem is when writing a web application. You will need to concatenate query string in <a> tag like <a href="list.php?a=x&b=y">. The query string parameters cannot be named as one of those HTML entities. So, <a href="list.php?a=x&lt=y"> does not work. The URL will be interpreted as list.php?a=x<=y. Certainly, no one would use lt as the name of a query string parameter. However, some other common names including euro, copy, pound, cent, uml, not, micro, times and divide will not work.