« Using REST for Web Services, Instead of SOAP | Main | Signs of Life at LinuxWorld Expo in New York »

XML.com Publishes Great Article on Difficulties of RSS Processing

Mark Pilgrim wrote an excellent article on practical RSS processing for XML.com, called Parsing RSS at All Costs. In it, he gives a sense of the breadth of problems associated with trying to parse headline feeds from many of the weblogs out on the Internet:

...{As} RSS has gained popularity, the quality of RSS feeds has dropped. There are now dozens of versions of hundreds of tools producing RSS feeds. Many have bugs. Few build RSS feeds using XML libraries; most treat it as text, by piecing the feed together with string concatenation, maybe (or maybe not) applying a few manually coded escaping rules, and hoping for the best.

Then he explain how desktop news aggregators are dealing with the situation:

... {Most} desktop news aggregators are now incorporating parse-at-all-costs RSS parsers which they use when XML parsing fails. However, since no one likes tag soup, they are also implementing subtle visual clues, such as smiley and frown icons, to indicate feed quality. Click on the frown face, and the end user can learn that this RSS feed is not well-formed XML. But the program still displays the content of the feed, as best it can, using a parse-at-all-costs parser.

The article goes on to give some code examples of how to deal with these problems using Python.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About CTDATA

CTDATA Venutures (CTDATA) develops Internet and Intranet applications for corporations and non profit organizations. Our services include:

  • Consulting services for Movable Type and TypePad-based publishing systems (visit our Weblog Improvement website for more information),
  • Financial services business process consulting,
  • Content management system and knowledge management system consulting,
  • Apache web server engineering and hosting,
  • MySQL, Sybase, and Microsoft SQL Server architecture and development,
  • SOAP, REST, and XML-RPC system architecture and programming, including Amazon Web Services and
  • Weblog publishing.
For more information, contact Dave Aiello by email at dave [at] daveaiello.com or call him at +1-267-352-4420.
Copyright © 1995-2010, CTDATA Ventures. All Rights Reserved.
Powered by
Movable Type 4.25