The Falko Project is an error-annotated corpus of German as a foreign language, maintained by the Humboldt Universität Berlin who made it publicly accessible.

Recently a new search engine was made available, practically replacing the old CQP interface. This tool is named ANNIS2 and can handle complex queries on the corpus.


There are several subcorpora, and apparently more to come. The texts were written by advanced learners of German. There are most notably summaries (with the original texts and a comparable corpus of summaries written by native-speakers), essays who come from different locations (with the same type of comparable corpus) and a ‘longitudinal’ corpus coming from students of the Georgetown-University of Washington.

The corpora are annotated by a part-of-speech tagger (the TreeTagger) so that word types and lemmas are known but most of all the mistakes can be found, with several hypotheses at different levels (mainly what the correct sentence would be and what might be the reason of the mistake).


The engine (ANNIS2) has a good tutorial (in English by the way) so that it is not that difficult to search for complex patterns across the subcorpora. It seems also efficient in terms of speed. You may search for word forms, annotations, trees and pointing relations.

There are several export formats available, from raw text to multiple layers using the EXMARaLDA (Extensible Markup Language for Discourse Annotation) format which was also used in past projects of the HU Berlin.


I would say this is an interesting update, I could use this resource in the months to come.

To my knowledge there is only one other error-annotated corpus of German, see EAGLE: an Error-Annotated Corpus of Beginning Learner German, which is to be distributed (soon ?) in EXMARaLDA XML format. As the learners begin to learn German the error annotation seems to be merely on a grammatical level (and not that much on a syntactic or semantic one).