Research scientist, Berlin-Brandenburg Academy of Sciences
Center for Lexicography of German
→ Notably in charge of contemporary and web text collections
- (Web) corpus construction and exploitation, from crawling/OCR to visualization
- Corpus and computational linguistics with emphasis on non-standard data
For more information see research blog and software released under open-source licenses
- CLARIN-D and German Text Archive (DTA) projects at the BBAW
- Research associate at the Austrian Academy of Sciences (Academy Corpora group)
- COW at the FU Berlin
- Corpus Linguistics and Instrumented Text Databases team at ICAR lab
For more information see the archives or my presentations on SlideShare.
See also this comprehensive publication list on the HAL archive.
Out-of-the-Box and Into the Ditch? Multilingual Evaluation of Generic Text Extraction Tools
Adrien Barbaresi, Gaël Lejeune
Language Resources and Evaluation Conference (LREC 2020), Proceedings of the 12th Web as Corpus Workshop (WAC-XII), pp. 5-13, 2020.
[PDF] [Code] [Project]
A corpus of German political speeches from the 21st century
11th Language Resources and Evaluation Conference (LREC 2018), pp. 792-797, 2018.
A Constellation and a Rhizome: Two Studies on Toponyms in Literary Texts
Visualisierung sprachlicher Daten: Visual Linguistics – Praxis – Tools, N. Bubenhofer & M. Kupietz (eds.), Heidelberg University Publishing, pp. 167-184, 2018.
Powered by Jekyll and Minimal Light theme.