Oncloud.org and CloudBridge (and how the web is like Donkey Kong Country)

Backstory

A few weeks ago I demoed a project I’d worked on called Oncloud.org. Oncloud was an offshoot project of something called CloudBridge, which was intended for use as a frontend load balancer for a hosting service I was thinking of working on. I’d worked on these projects about a year ago, and at the time they garnered a bit of interest, but not as much as I’d hoped so I kind of left them sitting for a while.

But this time it got a lot more interest. Probably because I explained it better. As the hits to oncloud mounted, I changed the text on the front page to more clearly identify what exactly it does for you. I’ve been really impressed and excited to see people using this project, so I decided I should explain a bit more about the concept behind them and how they work.

How The Web Works

You know those levels in Donkey Kong Country where you’re jumping between barrel canons, each shooting you to the next until you (hopefully) get where you’re going eventually? No? Check out this video.

The web is like that. You send out a request and it gets shot from the next canon and the next until finally a real backend web server takes your request and does something with it. These canons are called proxies (or more specifically HTTP proxies). Note that this is ignoring the lower level where the entire internet is like this. These days you can often count on your request being shot out of at least two canons: one that your isp runs (often invisibly) to speed up requests through caching, and one at the other end that does load balancing between backend machines.

While these two tasks are superficially quite similar, in reality they are quite different. The ISP proxy is blind as to the content it forwards. It knows you want something and it’ll go through the standard mechanism a browser would to find the server you really want and then forward it on.

The one at the other end, on the other hand, has a list of backends it knows will handle any request that reaches it. It forwards the request based on various potential heuristics, the simplest being round robin. It sends each request to the next backend on the list.

In that simplest configuration, it’s taking a crapshoot. The server in question might be down, it might be overloaded, or it might just plain be ignoring a completely underloaded server. More complex heuristics have been developed to manage this, including adding priority numbers, finding out the load average of all the backend servers, etc. Needless to say, these solutions become quite complex.

They especially get very complex if you’re trying to manage your cluster dynamically. If you need to bring up and down servers on a regular basis, that means changing the list on your load balancer(s) on a regular basis. Which also probably means confusing any kind of statistic gathering used to help prevent overloading a backend. This is especially a problem if you’re provisioning servers based on load through something like EC2.

Why CloudBridge is Different

Historically, this arrangement — the load balancer connecting to the backends — has been done because of the ready availability of high quality proxy implementations like squid and apache’s mod_proxy. These proxies weren’t originally meant for this task, but they’ve been modified to support it.

CloudBridge takes this arrangement and flips it. Rather than the load balancer connecting to the backends, the backends connect to the load balancer and signal availability to handle a request. They do this through an HTTP extension called a BRIDGE request. More technical details can be read on github in the protocol description.

To bring it back to the Donkey Kong Country example, where the existing web is a bunch of barrel canons, CloudBridge has the two sides meet in the middle forming a bridge. Hence the name.

This way, the cloudbridge server can be dumb. It doesn’t need a list of backends. In fact, it doesn’t even need a list of hosts it handles. Through the use of hash-based secret keys, your backends can be authorized to handle a domain without the cludbridge server needing to be restarted or re-read a configuration list.

And (though this hasn’t been implemented yet) you can also have the cloudbridge server give information on how many waiting requests on either side a given host has. If there are a whole bunch of client requests for a domain waiting to serve up, you know that that particular domain probably needs some extra backend servers and spin them up. Or if there are too many waiting backends, you can shut them down.

Which is exactly what you need to do if you’re hosting your servers on a system like EC2. Provisioning for peak all the time from EC2 is exactly what you don’t want to do. When there’s only one user on your site in the middle of the night, you want only one backend server running.

How That Turned Into OnCloud

Which leaves OnCloud. OnCloud was something I realized you could do part way through implementing cloudbridge. Given the properties above, you could run a cloudbridge as a sort of open proxy letting users connect to it from behind firewalls and then their development app would become available publicly. This would be much simpler than the more common practice of using an ssh tunnel or Heroku’s limited free hosting to achieve roughly the same results.

In particular, this is really good for developing facebook or twilio apps. But that’s a whole other blog post.

It’s not exactly fast, and it runs with a relatively low connection limit to prevent abuse, but it’s there and it works, and a lot of people have now used it (and continue to use it), and that makes me very very happy.

EAVB_GFGQRIDVWT
Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks

This entry was posted on Thursday, March 18th, 2010 at 5:14 pm and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a Reply