In a comment, Phil Aaronson correctly guesses that I was pleased to read in Google's Chubby paper that:
[Chubby's] ability to provide swift name updates without polling each name individually is so appealing that Chubby now provides name service for most of the company's systems.
As many know, I've often argued against the horrible waste that comes from polling for RSS and Atom syndication files. But feed syndication isn't the only application area where polling is overused. As Google demonstrates, polling simply doesn't scale for any system doing thousands or millions of DNS lookups per second. (Phil and I ran into many problems with overloading DNS servers while working on Web Traffic Analysis systems at Accrue Software...)
Even though polling is known to have many weaknesses, we have many Internet protocols that either rely exclusively on polling or only provide request/response interfaces that force polling. In many cases, these protocols were defined to use polling since polling is typically very easy for client's to implement and since, in many cases, the polling loads created by the average client have been light enough to accept the inefficiencies of polling. However, as the Internet scales out and as the difference in load generated by the largest systems grows proportionately greater than the load generated by the smaller, more common systems, I think we may see a growing need to not only use push more but also to define multiple-tiers of protocols. We may need one protocol based on polling for smaller systems and another protocol based on push for larger systems.
Scale makes a difference... What is acceptable for small systems is often completely unacceptable for larger ones. We saw this clearly in the world of feed synchronization. If you have a feed aggregator that is trying to monitor the content of only a few dozen feeds, then it is simple to rely on polling RSS/Atom feeds -- there is some waste involved, but that waste is "acceptable" to many small clients. On the other hand, if you are trying to monitor millions of feeds, polling simply won't work. That is why much of the process of feed update discovery for large blog aggregators has moved away from simple polling and now relies on push-based services like the FeedMesh, pinging, and the SixApart Update Stream. Rather than polling millions of feeds every few minutes, today's largest blog aggregators rely on having many of the updates pushed to them at the same time that most small or personal aggregators still rely on the inefficient but simple polling model.
FTP or File copy presents yet another example of a protocol that works "good enough" under light loads but doesn't do well for people distributing files to large numbers of users. For distributors of files to large audiences, something like BitTorrent often makes a great deal more sense than simple FTP. Of course, while BitTorrent can really help large distributors, it doesn't do anything useful for those who only distribute one or two copies of a file... So, what works for light load doesn't work for heavy loads and what is good for some heavy loads isn't useful for light loads... What we need here is is two protocols -- or, in some cases, a single protocol that incorporates support for two usage models.
Of course, nobody likes to define multiple protocols to get a single job done. Thus, you're not seeing a rush in the IETF to define "high volume" or "large scale" versions of the many protocols that quite adequately meet the needs of most users today. But, in the future, it might make sense for us to recognize that at some point quantitative differences translate into qualitatively different problems. Perhaps we should start thinking of serving both small and large systems as two different jobs and recognize that while we have many of the protocols we need to address small systems needs, we still don't have what we need to support the larger systems.
bob wyman
Comments