Description

This is the third release of a work in progress collecting political speeches from the German Presidency, Presidency of the Bundestag, Chancellery, and Ministry of Foreign Affairs.

The corpus has been released on the occasion of the LREC 2018 conference and includes updated texts with metadata encoded in XML format.

See the description paper (PDF, in English, 6 pages), which you can use to refer to this corpus (BibTeX entry).

The permanent URL to access this resource is http://purl.org/corpus/german-speeches

Feel free to contact me if you have questions, if you would like to work on this corpus, if you want a particular list of queries to be performed on it, etc.

If you wish to use the corpus, please cite at least the following elements and if possible the permanent URL (http://purl.org/corpus/german-speeches):

Downloads

Current version

Legacy versions

Visualizations (to be updated)

Beyond this point, the pages are in German (navigation should be instinctive though):

For maintenance reasons the pages are static: word lists of relevant queries, output in valid CSS/XHTML format.
Theoretically decent display on all desktop versions of Firefox, Safari, Chrome and Opera.

Mentions

The mentions below are updated on a regular basis.

Corpus and Computational Linguistics

History and Political Science

Miscellaneous

Changelog

05/09/18 Third release, updated text archive.
08/03/12 First part of the code released (crawler and corpus builder): https://github.com/adbar/gps-corpus-builder
03/05/12 Release of the 2nd version - POS-tags, lemmas, XML TEI, keywords.
12/06/11 Readme and CC BY-SA license added.
09/08/11 The texts are now numbered in chronological order. Better formatting (title and meta-description, paragraphs).
09/01/11 Better display of the speeches (CSS) and general list.
08/16/11 Minor bugs corrected, new welcome page in German.
07/25/11 First release.