Using RSS and Atom feeds to collect web pages with Python
This post describes practical ways to find recent URLs within a website and to extract text, metadata, and comments. It contains all necessary code snippets to optimize link discovery and document filtering.
more ...