Categories
Programming Security web-design

Open Redirects

In this post I’ll discuss an issue I tackled a short while ago – open redirects. But first, the story of how I got to it. Feel free to skip ahead to the technical discussion.

Background

Our analytics for plnnr.com – the website for trip planning wasn’t working as well as we wanted. We’re using Google Analytics, and it’s hard generating the specific report we want, and when we did get it, it seemed to show inaccurate numbers. To partially alleviate the issue, I was required to add tracking pixels for facebook & adwords, so we can better track conversions.
For us, an “internal” conversion is when a user clicks on a link to a booking url (for a hotel, or any other “bookable” attraction).
After reviewing it, I decided that the best course of action would be to create an intermediate page, on which the tracking pixels would appear. Such a page would receive as a parameter the url to redirect to, and will contain the appropriate tracking pixels.

Description of problem

Let’s say we build the url for the page like so:

/redirect/?url=X

This page will load the appropriate tracking pixels, and then redirect to the given url (the X).
The problems are:
1. We are potentially exposing ourselves to Cross Site Scripting (XSS), if we don’t filter the redirect url correctly. A malicious website could create links to our page that will run scripts in our context.

2. A malicious webmaster could steal search engine authority. Let’s say he has two domains: a.com and b.com, of which he cares about b.com. He creates links on a.com to:

ourdomain.com/redirect/?url=b.com

A search engine crawls his page, sees the links to his domain, and gives ourdomain.com authority to b.com. Not nice.

3. A malicious website could create links to ourdomain.com that redirect to some malware site, this way harming the reputation of ourdomain.com, or creating better phishing links for ourdomain.com.

Possible solutions

Before we handle the open-redirect issues it’s important to block cross site scripting attacks. Since the attack might be possible by inject code into the url string, this is doable by correctly filtering the urls, and by using existing solutions for XSS.

As for the open redirect:

1. Non solution: cookies. We can pass the url we want in a cookie. Since cookies may only be set by our domain, other websites would not be able to set the redirect url. This doesn’t work well if you want more than one redirect link, or with multiple pages open, etc.

2. Checking the referrer (“referer”), and allowing redirects to come only from our domain. This will break for all users who use a browser that hides referrer information, for example, those using zone-alarm. Google also suggests that if the referrer information is available, block if it’s external. That way we are permissive for clients that hide it.

3. Whitelisting redirect urls. This solutions actually comes in two flavors – one is keeping a list of all possible urls, and then checking urls against it. The other is keeping a list of allowed specific url parts, for example, domains. While keeping track of all allowed urls may be impractical, keeping track of allowed domains is quite doable. Downside is that you have to update that piece of the code as well each time you want to allow another domain.

4. Signing urls. Let the server keep a secret, and generate a (sha1) hash for each url of “url + secret”. In the redirect page, require the hash, and if it doesn’t match the desired hash, don’t redirect to that url. This solution is quite elegant, but it means that the client code (the javascript) can’t generate redirect URLs. In my case this incurs a design change, a bandwidth cost, and a general complication of the design.

5. Robots.txt. Use the robots.txt file to prevent search engines from indexing the redirect page, thereby mitigating at least risk number 2.

6. Generating a token for the entire session, much like CSRF protection. The session token is added to all links, and is later checked by the redirect page (on the server side). This is especially easy to implement if you already have an existing anti-csrf mechanism in place.

7. A combination of the above.

Discussion and my thoughts

It seems to me, that blocking real users is unacceptable. Therefor, only filtering according to referrer information is unacceptable if you block users with no referrer information.
At first I started to implement the url signing mechanism, but then I saw the cost associated with it, and reassessed the risks. Given that cross-site-scripting is solved, the biggest risk is stealing search-engine-authority. Right now I don’t consider the last risk (harming our reputation) as important enough, but this will become more acute in the future.

Handling this in a robots.txt file is very easy, and that was the solution I chose. I will probably add more defense mechanisms in the future. When I do add another defense mechanism, it seems that using permissive referrer filtering, and the existing anti-csrf code will be the easiest to implement. A whitelist of domains might also be acceptable in the future.

If you think I missed a possible risk, or a possible solution, or you have differing opinions regarding my assessments, I’ll be happy to hear about it.

My thanks go to Rafel, who discussed this issue with me.

Further reading

* http://www.owasp.org/index.php/Open_redirect
* http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=171297
* Open Redirects and Phishing Vectors/

Categories
Databases Design Programming

A Simple Race-Condition

Lately, I’ve mostly been working on my startup. It’s a web-application, and one of the first things I’ve written was a cache mechanism for some lengthy operations. Yesterday, I found a classic race-condition in that module. I won’t present the code itself here, instead I’ll try to present the essence of the bug.

Consider a web application, required to do lengthy operations from time to time, either IO bound, or CPU bound. To save time, and maybe also bandwidth or CPU time, we are interested in caching the results of these operations.
So, let’s say we create some database table, that has the following fields:

  • Unique key: input
  • output
  • use_count *
  • Last use date *

I’ve marked with an asterisk the fields that are optional, and are related to managing the size of the cache. These are not relevant at the moment.
Here is how code using the cache would look like (in pseudocode form):

result = cache.select(input)
if result:
    return result
result = compute(input)
cache.insert(input, result)
return result

This code will work well under normal circumstances. However, in a multithreaded environment, or any environment where access to the database is shared, there is a race-condtion: What happens if there are two requests for the same input at about the same time?
Here’s a simple scheduling that will show the bug:

Thread1: result = cache.select(input). result is None
Thread1: result = compute(input)
Thread2: result = cache.select(input) result is None
Thread2: result = compute(input)
Thread2: cache.insert(input, result)
Thread1: cache.insert(input, result) – exception – duplicate records for the unique key input!

This is a classic race condition. And here’s a small challenge: What’s the best way to solve it?

Categories
Programming Python web-design

LStrings and ASCII-Plotter at UtilityMill.com

It seems Gregory is fond of ascii-art, as he took my scripts, and put them online at utilitymill.com.

UtilityMill.com is a cool new website, which allows you to run scripts as small web applications. What Gregory actually did, with just a few key-strokes, was turning my two scripts into web applications. While these scripts aren’t the most useful web applications in the world, I could think of a few others that are.

Gregory also thought of another use for the UtilityMill. Consider my post ‘Rhyme and Reason with Python’. Let’s say you want to try out my script as well, but you don’t want to go through the hassle of downloading it and all the required dependencies, and then figure out how to run it. If I set it up as a small web application there, I could have any reader be able to see the code and its output without any effort. I’ll be sure to try something like that in my next code-oriented post.