The Microblog Explorer project is about gathering URLs from social networks (FriendFeed, identi.ca, and Reddit) to use them as web crawling seeds. At least by the last two of them a crawl appears to be manageable in terms of both API accessibility and corpus size, which is not the case concerning Twitter for example.
- These platforms account for a relative diversity of user profiles.
- Documents that are most likely to be important are being shared.
- It becomes possible to cover languages which are more rarely seen on the Internet, below the English-speaking spammer’s radar.
- Microblogging services are …