Current research interests
- (Web) corpus construction, exploitation, and visualization, from crawling/OCR to quality assessment
- Corpus and computational linguistics, especially non-standard data
- I currently work for the Austrian Academy of Sciences (Academy Corpora lab) as well as for the Berlin-Brandenburg Academy of Sciences
- I have worked for the CLARIN-D and German Text Archive (DTA) projects at the BBAW
- I have fed the COW (COrp[us/ora] from the Web) at the FU Berlin (2012 – 2013).
- I have been a member of the Corpus Linguistics and Instrumented Text Databases team at the ICAR lab (2010 – 2015).
Ad hoc and general-purpose corpus construction from web sources (École Normale Supérieure de Lyon, 2015)
Thesis committee: Benoît Habert (advisor), Thomas Lebarbé (chair), Henning Lobin (reviewer), Jean-Philippe Magué (co-advisor), Ludovic Tanguy (reviewer).
- Co-editor of
- Journal for Language Technology and Computational Linguistics (JLCL) @ GSCL (German Society for Computational Linguistics & Language Technology)
- Reviewer at
- KONVENS 2014 & 2016
- Web as Corpus (WAC) Workshops 9 & 10
- RECITAL 2015
- Previous director (2011-2013) of ENthèSe (association of doctoral candidates)
- Previous editor of its blog & webmaster of les-jeunes-chercheurs.fr