The Falko Project is an error-annotated corpus of German as a foreign language, maintained by the Humboldt Universität Berlin who made it publicly accessible.
There are several subcorpora, and apparently more to come. The texts were written by advanced learners of German. There are most notably summaries (with the original texts and a comparable corpus of summaries written by native-speakers), essays who come from different locations (with the same type of comparable corpus) and a ‘longitudinal’ corpus coming from students of the Georgetown-University of Washington.
The corpora are annotated by a part-of-speech tagger (the TreeTagger) so that word types and lemmas are known but most of all the mistakes can be found, with several hypotheses at different levels (mainly what the correct sentence would be and what might be the reason of the mistake).
The engine (ANNIS2) has a good tutorial (in English by the way) so that it is not that difficult to search for complex patterns across the subcorpora. It seems also efficient in terms of speed. You may …more ...