XML.com Publishes Great Article on Difficulties of RSS Processing
Mark Pilgrim wrote an excellent article on practical RSS processing for XML.com, called Parsing RSS at All Costs. In it, he gives a sense of the breadth of problems associated with trying to parse headline feeds from many of the weblogs out on the Internet:
...{As} RSS has gained popularity, the quality of RSS feeds has dropped. There are now dozens of versions of hundreds of tools producing RSS feeds. Many have bugs. Few build RSS feeds using XML libraries; most treat it as text, by piecing the feed together with string concatenation, maybe (or maybe not) applying a few manually coded escaping rules, and hoping for the best.
Then he explain how desktop news aggregators are dealing with the situation:
... {Most} desktop news aggregators are now incorporating parse-at-all-costs RSS parsers which they use when XML parsing fails. However, since no one likes tag soup, they are also implementing subtle visual clues, such as smiley and frown icons, to indicate feed quality. Click on the frown face, and the end user can learn that this RSS feed is not well-formed XML. But the program still displays the content of the feed, as best it can, using a parse-at-all-costs parser.
The article goes on to give some code examples of how to deal with these problems using Python.