Tag Archives: HTML

Malformed HTML & XSS Character Filtering: A Few Lessons

On a recent web app pen test I ran into the following issue: the application would escape (that is, add backslashes) before single and double quotes but not filter other characters. Upon review of the source it was clear the developer knew that not escaping these quotes would lead to issues (such as SQL injection) but they did not appear the grasp the larger picture.

For those who wish to avoid the technical discussion below, here are the talking points:

  • If you are a developer, filter out as many characters as possible without breaking your web app
  • If you are a pen tester, the fewer characters filtered the more likely the code has vulnerabilities

What I hope to show is how an assumption about syntax could prove untrue since the following two statements are valid:

  • Not enough filtering may allow the attacker to create “enough valid” HTML syntax to allow code injection
  • The browser’s disposition to interpret malformed or “bad html” as valid gives the attacker more surface area to attack

Below is a tutorial on basic malformed HTML in three parts. The first part shows how bad filtering can lead one to create valid (or semi-valid) HTML the developer did not intend. The examples will show that the browser will interpret partially malformed HTML as legitimate despite syntax errors. The second part shows that once the attacker creates “valid” HTML injecting attack code is made much easier by injecting additional malformed HTML. The third part is an exploit created from parts one and two.

The code below is a case of  a developer dynamically building a string to pin/post to pinterest. The code has been rewritten to hide the source of the material but it is a true and formally live case of the  logical errors described above.

My motivation for this basic tutorial is my response from another professional. While I was investigating this issue on-site another infosec professional said to me that if the quotes are escaped then the app can’t be exploited. It turns out this is the same incorrect assumption the developer made. Perhaps this will help explain this issue so that other professionals recognize instances of this issue.

Note: the syntax highlighting is not an accurate representation on how the browser will interpret the HTML. In fact, the syntax highlighting below is more strict.

Part 1 – Bad Filter, Bad HTML

The developer’s assumption was simply that if the attacker could not pass a single or double quote they could not create syntactically meaningful HTML or exploit the application. For example, the following valid HTML

after being run through the filter would be turned into:

which, due to the backslashes, is meaningless to the browser.

Unfortunately simple cases of only filtering double quotes is an assumption since we can sometimes: first, end tags meaningfully despite the escape; and second, do not need double quotes in our HTML in order to get some invalid HTML to be “properly” interpreted as valid.

In this case I found the developer was building strings dynamically from values in the URL. The offending URL looked like this:

So putting “test” in the badparm as so:

resulted in the following dynamic HTML

So you’ll note that we have the ability to get our input — the string ‘test’ — into the middle of the dynamically built string. Given that this code is intended to be “pinned”, there are two forms of attack here. Once we can get our code to be injected via a URL if we can get it pinned on pinterest– either through us manually pinning it or through social engineering the end user — it will be stuck there. Quite interestingly, it’s possible for our reflected XSS injection attack to become a stored injection attack (as long as pinterest does not filter the characters in our URL). Then anyone who clicks on the pinterest link will run the code when it comes back to badsite.com. The other way is to simply leave our attack as a reflected attack and inject our code into badsite.com through a URL on which we get an end user to click. In either attack, to be successful, we will need to end (or close) the tag otherwise our code will remain as part of the URL and not as code to be parsed and interpreted in the next block.

Let’s test out the filter. If we add a double quote to the end of badurl.com as such:

We have the following output (slightly abbreviated from the longer version above):

So far, the string is escaped properly. But what if we put in a > after as such:

All of a sudden we get the following code:

To be more explicit the following snippet from above is “valid” html:

Yes, despite the mangled \”> it is treated as a valid ending to HTML. The escape backslash added by the developer’s filter is not counted!

In all fairness, the two code snippets above is not valid but no matter. Since we’ve successfully closed the tag according to the browser we can now put our code after to be executed as a new tag.

Part 2 – Bad HTML, Bad Filter equals Good Exploit

In the prior section we learned about how escaping can be bypassed if the HTML end tag is seen as valid. In this case we will see how malformed HTML can be interpreted as valid.

Here is a valid image tag:

But did you know the following is also interpreted as valid by the browser?

That’s right. An image tag without double quotes! So, here’s the important point. If the image tag does not need double quotes, when we encounter filters that only filter the quotes we will bypass it. Who needs quotes anyway?

In this simple example I loaded Google’s homepage image via an img tag. I leave it to the reader as an exercise to understand how the img tag may be used to inject javascript and other malicious code. In short, if a researcher can inject the img tag into a dynamic URL displayed on a webpage (as shown above) they can do damage. So, pretend the Google image is really an exploit.

Part 3 – The Exploit!

If we go back to our bad url from Part 1:

and put a “> and then our malformed image tag without double quotes we get the following URL:

If we execute the URL we get the following code as a result:

The browser will now run our mangled/malformed injected img code (without the dquotes) since we’ve closed off the HTML and our injected code is seen as the next tag for the browser to interpret.

To be more explicit, the following snippet parsed from the code quoted just above will be treated as valid HTML:

Lesson: filter out those extra characters!