This is the third release of a work in progress collecting political speeches from the German Presidency, Presidency of the Bundestag, Chancellery, and Ministry of Foreign Affairs.

The corpus has been released on the occasion of the LREC 2018 conference and includes updated texts with metadata encoded in XML format.

See the description paper (PDF, in English, 6 pages), which you can use to refer to this corpus (BibTeX entry).

The permanent URL to access this resource is

Feel free to contact me if you have questions, if you would like to work on this corpus, if you want a particular list of queries to be performed on it, etc.

If you wish to use the corpus, please cite at least the following elements and if possible the permanent URL (


Current version

Legacy versions

Visualizations (to be updated)

Beyond this point, the pages are in German (navigation should be instinctive though):

For maintenance reasons the pages are static: word lists of relevant queries, output in valid CSS/XHTML format.
Theoretically decent display on all desktop versions of Firefox, Safari, Chrome and Opera.


The mentions below are updated on a regular basis.

Corpus and Computational Linguistics

History and Political Science



05/09/18 Third release, updated text archive.
08/03/12 First part of the code released (crawler and corpus builder):
03/05/12 Release of the 2nd version - POS-tags, lemmas, XML TEI, keywords.
12/06/11 Readme and CC BY-SA license added.
09/08/11 The texts are now numbered in chronological order. Better formatting (title and meta-description, paragraphs).
09/01/11 Better display of the speeches (CSS) and general list.
08/16/11 Minor bugs corrected, new welcome page in German.
07/25/11 First release.