Description

This is the 4th release of a work in progress gathering political speeches in German. The corpus currently includes a total of 6,685 speeches by 71 speakers, spanning a time from 1984 to 2017 and amounting to about 13 million words. The texts come from the following sources:

See the description paper (PDF, in English, 6 pages), which you can use to refer to this corpus (BibTeX entry). If you wish to use the texts, please cite at least the following elements and if possible the permanent URL (http://purl.org/corpus/german-speeches):

Feel free to contact me if you have questions or if you would like to collaborate on this corpus.

Data

The XML archives below consist of texts with metadata encoded in XML format. The corpus can now also be queried online here using a full-text search featuring linguistic annotation:

Current version

Legacy versions (outdated, for reproducibility only)

Visualizations (beta version from 2018)

Beyond this point, the pages are in German (navigation should be instinctive though):

For maintenance reasons the pages are static: word lists of relevant queries, output in valid CSS/XHTML format.
Theoretically decent display on all desktop versions of Firefox, Safari, Chrome and Opera.

Mentions

The mentions below are updated on a regular basis.

Corpus and Computational Linguistics

History and Political Science

Miscellaneous

Changelog

2019-06-17 4th release: Augmented text base, deduplication and refined metadata.
2018-09-28 Refined speaker metadata and text base for the Chancellery.
2018-08-30 Refined text base and updated visualizations.
2018-05-09 3rd release, updated text archive.
2012-08-03 First part of the (now outdated) code released: https://github.com/adbar/gps-corpus-builder
2012-03-05 2nd version: POS-tags, lemmas, XML TEI, keywords.
2011-12-06 Readme and CC BY-SA license added.
2011-09-08 Better visualizations of the speeches and better formatting.
2011-08-16 Minor bugs corrected.
2011-07-25 First release.