Saturday, September 14, 2013

Websocket and server-side scalability

Wireless Sensor Networks and Internet of Things are examples of application domains where even millions of end-points may be connected to the same server system simultaneously. If each of them are having a socket connection open all the time, either plain TCP socket or Websocket, it yields a server-side scalability problem.

Every now and then one may hear concerns of having high number of concurrent connections to the server. That's the major argument against use of Websocket that I have heard of. Opponents are typically suggesting polling-type of approach, which ruins the responsiveness of the system. That makes me study the topic further.

In general, any system implemented with any technology can easily handle up to 10k concurrent connections, but beyond that you may get troubles (C10k problem). TCP supports ~64'000 free ports per IP address. One workaround is to bind several IP addresses to the same computer. There are many service providers who claim to support millions of concurrent websocket connections. However, I don't think rely on proprietary solution of a single service is an approach generic enough. Anyway, advanced technologies like clustering is needed, which makes it more expensive for sure.

Let's take a look at the software architecture. Thread per connection approach is insane. Even an object instance per thread is overkill, in terms of memory consumption. Use of asynchronous methods is the only sensible approach for server implementation. But no matter how efficient your server implementation is, each open TCP socket connection consumes proportionally memory in the underlain operating system.

That makes me thinking use of UDP instead of TCP. It is unnecessary to consume the memory of the OS, if all connection-less connections are served through a single UDP port. But then we loose the beauty of identifying connections by IP and Port of each end. Some sort of application layer solution is needed instead.

State synchronization protocol (SSP)  is one possibility. It's based on UDP, and provides many advantages over TCP sockets, like client-side mobility (roaming), persistent connections over vague and temporary networks, and good response. At the moment of writing, the only known application that uses SSP is mosh, a replacement to the SSH terminal.

Mosh is available for many Unix/Linux variants and Mac OSX, but the SSP protocol is not published yet as a general purpose library. One can, of course, extract the protocol code from the open source implementation of mosh.

From embedded systems point of view, like WSN and IoT that I mentioned at the beginning, I think SSP kind of approach could be even better connection technology than Websocket. But as long as your system does not need to scale up to millions of concurrent connections, Websocket is still a good initial guess.

No comments:

Post a Comment