Last Monday, I released an updated version of both corpus and visualization tool on the occasion of the DGfS-CL Poster-Session in Frankfurt, where I presented a poster (in German).
The first version had been made available last summer and mentioned on this blog, cf this post: Introducing the German Political Speeches Corpus and Visualization Tool.
For stability, the resource is available at this permanent redirect: http://purl.org/corpus/german-speeches
Description
In case you don’t remember it or never heard of it, here is a brief description:
The resource presented here consists of speeches by the last German Presidents and Chancellors as well as a few ministers, all gathered from official sources. It provides raw data, metadata and tokenized text with part-of-speech tagging and lemmas in XML TEI format for researchers that are able to use it and a simple visualization interface for those who want to get a glimpse of what is in the corpus before downloading it or thinking about using more complete tools.
The visualization output is in valid CSS/XHTML format, it takes advantage of recent standards. The purpose is to give a sort of Zeitgeist, an insight on the topics developed by a government official and on the evolution in the use of general concepts.
Changes
The corpus has been updated and ships with an integrated text enrichment:
- Tokenisation (Perl scripts), POS-tags and lemmatization (TreeTagger) are included.
- Nearly TEI-compliant XML format.
On the visualization side, there are no major changes, but a lot of improvements :
- The web pages are lighter, as they are completed on-the-fly by scripts (Javascript, client-side).
- There is a list of keywords for each text, which is still experimental but gives a rough idea of what is inside.
- The script that highlights the selected words in the texts has been improved but still does not get the words beginning with Ä, Ö or Ü, although they are rather frequent.
- The ugly menu has been replaced by a real tab interface, and overall the CSS files fit more versions of Firefox, Chrome, Safari and Opera.
Please use this technical paper to learn more details about this resource as well as to refer to it.