I am currently working on a resource I would like to introduce : the German Political Speeches Corpus (no acronym apart from GPS). It consists in speeches by the last German Presidents and Chancellors as well as a few ministers, all gathered from official sources.
As far I as know no such corpus was publicly available for German. Most speeches could not be found on Google until today (which is bound to change). It can be freely republished.
The two main corpora (Presidency and Chancellery) are released in XML format basing on raw text and metadata.
There is a series of improvements I plan, among which a better tokenization and POS-tags.
I am also working on a basic visualization tool enabling users to get a first glimpse of the resource, using simple text statistics in form of XHTML pages (a sort of Zeitgeist). By now it is static and I still need to brush up the CSS, but it is functional.
I think that I could take benefit from the corpus and the statistics display for my research on complexity levels.
Here is the permanent URL of the resource :
http://purl.org/corpus/german-speeches
Additional information and download there.
This is the first release and the first post about this topic, but I may describe a few CSS tricks or querying procedures I used in future posts. Till then, more information is to be found in the formal technical paper (pdf).