German Political Speeches Corpus and Visualization

Summary

  1. Description
  2. Downloads
  3. Visualizations
  4. Mentions
  5. Code
  6. Change log

Description

This is the second release of a work in progress collecting political speeches from the German Presidency and Chancellery, as gathered and republished by Adrien Barbaresi.

See the description paper (PDF, in English, 5 pages), which you can use to refer to this corpus (BibTeX entry).

The permanent URL to access this resource is http://purl.org/corpus/german-speeches

It was released on the occasion of the DGfS-CL Poster-Session, where I presented a poster. It includes updated texts, POS-tags and lemmas encoded in a nearly-compliant XML TEI format. The web pages are now lighter and contain relevant keywords for each text.

Feel free to contact me if you have questions, if you would like to work on this corpus, if you want a particular list of queries to be performed on it, etc.

If you wish to use the corpus, please cite at least the following elements:
Barbaresi, Adrien (2012). "German Political Speeches, Corpus and Visualization" http://purl.org/corpus/german-speeches

Downloads

Visualizations

Beyond this point, the pages are in German (navigation should be instinctive though).
Due to infrastructure problems they are still static : word lists of relevant queries, output in valid CSS/XHTML format.

Theoretically decent display on all versions of Firefox, Safari, Chrome and Opera (not perfect though).

Mentions

The mentions below are updated on a regular basis.
If you wish to use the corpus, please cite at least the following elements:
Barbaresi, Adrien (2012). "German Political Speeches, Corpus and Visualization" http://purl.org/corpus/german-speeches

Scientific publications

Corpus and Computational Linguistics
History and Political Science

Miscellaneous

Code

The code enabling to gather a corpus is available under an open source license: GPS Corpus Builder.
The visualization should follow.

Change log

08/03/12 First part of the code released (crawler and corpus builder).
03/05/12 Release of the 2nd version - POS-tags, lemmas, XML TEI, keywords.
12/06/11 Readme and CC BY-SA license added.
09/08/11 The texts are now numbered in chronological order. Better formatting (title and meta-description, paragraphs).
09/01/11 Better display of the speeches (CSS) and general list.
08/16/11 Minor bugs corrected, new welcome page in German.
07/25/11 First release.