Wednesday, February 9, 2011

caching the internets

say you wanted a project to provide a small amount of internet bandwidth to a large number of users (say, an african town via a satellite link, or a few blocks in egypt with a t1 line). you'd need some serious caching, access control and traffic shaping to ensure it kept working.

first of all you have to determine capacity and limit use. you can't just allow a thousand retards to start torrenting every season of House from a fucking satlink. total outbound and inbound traffic must be regulated to allow a usable number of connections at a usable bandwidth (so for the sake of argument, slightly slower than a 56k modem). no more than $BANDWIDTH/$MODEM_SPEED streams at a time with a timeout (tcp keepalives disabled). big syn backlog buffer to wait for an available slot while trying to connect.

also you need to shape a couple protocols for less latency. ssh gets higher priority, but up to a certain amount of bandwidth... if an ssh session uses more than 5 megabytes of traffic, somebody is fucking scp'ing so kill that connection (not that they can't get around that with an rsync loop). SIP and some other protocols also low latency.

squid or some other more efficient proxy with HUGE cache store at the uplink point. dns proxy as well. also if it's not too much trouble, a pop3/imap caching server, and definitely an (authenticated) smtp relay to pass messages when the link is available again. run an ad-blocking thing in the proxy to strip out all unnecessary garbage content which would just suck up bandwidth otherwise. if you want to get retarded, block all streaming content. if you want to get SUPER retarded, limit allowed content to only a few MIME types (text/html, image/jpeg, text/plain, etc). allow for whitelists of commonly-hit, cacheable content among the stuff that's blocked.

it could be that additional routes are added to this one tiny uplink as the network grows. add a new caching server with the same tweaks at each router so that cache is kept at each subnet and also the main uplink point. this helps reduce bandwidth used getting to the uplink point itself, allowing your intermediary routes to also be weak/small.

what may also help is an additional proxy at the other side of the satlink which compresses content before being sent to the client pipe; kind of like Opera, this would (for example) compress images on a network which had fast internet access and then send across the slow satlink to the caching stuff, further decreasing delivery time and bandwidth.

SSL makes all this caching obviously more hairy and bandwidth demands more intense. perhaps shape down the connection speed of SSL connections since we know they're going to suck more bandwidth and reduce total possible client connections. or, if people would consider this, provide an encrypted VPN solution that people could connect to on the cache box and then do all their traffic via plain-text. another option: run an sslstrip-like app on any site that will work without ssl, and basically just circumvent security and tell the users not to expect privacy. more bandwidth or more security, you decide.

it should go without saying, but nazi-esque firewall policies implemented on the borders. block everything unless explicitly requested and with a good reason. use layer 7 filtering wherever possible to ensure they're really using those ports for what they say they are. if it's to use some common public service like AIM, only allow the servers that AIM uses.

No comments:

Post a Comment