The case for RFC3229+feed support is logical and should be clear to anyone implementing syndication software (RSS or Atom). However, logic is always best backed up by empirical evidence. Thus, I've done an analysis of recent accesses to RSS and Atom files on PubSub.com and have the following results which show a dramatic drop in bytes transferred per request as well as the distribution of transfer sizes for different HTTP status codes: (Note: RFC3229+feed responses return Status code 226. Also, we started to support RFC3229+feed on the 30th.)
| Status | %Requests | %TotalBytes | AvgBytes |
| 200 | 15% | 85% | 77,212 |
| 226 | 10% | 14% | 18,974 |
| 304 | 73% | ||
| 53,443 |
Clearly, implementation of RFC3229+feed has resulted in a significant reduction in our actual bandwidth requirements. On average, responses to requests that specify the "A-IM: feed" in their headers are 25% the size of requests that do not. As a result, even though RFC3229+feed requests make up 69% of the requests we processed which return results (i.e. not 304), these requests only required 14% of the bandwidth we used to satisfy requests.
These results are, of course, very specific to PubSub.com and are very influenced by the specific details of our site and the particular usage patterns of our users.
- A large percentage of our requests come from Bloglines -- one of the first aggregators to support RFC3229+feed.
- We set the maximum number of items in an RSS or Atom file to 32. Sites that have smaller or larger entry counts will experience very different results. (Bandwidth savings will generally be proportional to the number of entries in feeds. Generally, the larger your feeds are, the larger will be your savings.)
- The size of entries will impact your savings. However, since PubSub.com publishes what is essentially a random sample of all blogs, our entry size is probably close to the average entry size in the Blogosphere.
- The size of the feed or channel metadata will impact the savings since this data is always sent whether there is only one new entry or twenty.
- We are fortunate that so many of the readers that pull data from us use either if-modified-since or if-none-match. (This can be seen in the number of "304 Not Modified" responses.) If these features of HTTP were not so widely used, then we would require four times the bandwidth that we currently use.
- If all readers that fetch data from us were to use RFC3229+feed, then it appears that we could reduce our bandwidth requirements by about two-thirds (2/3). That is a very substantial savings.
Of course, when we look at the numbers, we see that the largest number of requests are those regular, often hourly, polling requests that result in no updates. Unfortunately, our Apache server doesn't keep track of the number of bytes needed to satisfy such requests. However, while it is vastly smaller than would be required without the conditional GET support in HTTP, it is still substantial. This is an unavoidable cost of the polling model of syndication... The only way to eliminate the endless waste of these "304" responses is to move to a "push" model of distribution instead.
It is clear that RFC3229+feed results in very significant bandwidth savings. I believe that all users of news aggregators and other syndication tools should strongly encourage the developers of those tools to provide support for RFC3229+feed as soon as possible. Not doing so simply wastes network bandwidth. Given that it has been shown on numerous occasions that supporting RFC3229+feed is trivial for news aggregators, it is hard to imagine how anyone could present a responsible argument for not making the minimal modifications needed. Responsible aggregators will support RFC3229+feed.
bob wyman

Comments