Research scientist, Berlin-Brandenburg Academy of Sciences
Center for Digital Lexicography of German
→ Notably in charge of contemporary and web text collections
For more information see research blog and software released under open-source licenses
For more information see the archives or my presentations on SlideShare.
See also my profile on Google Scholar.
Trafilatura: A Web Scraping Library and Command-Line Tool for Text Discovery and Extraction
Adrien Barbaresi
Proceedings of ACL/IJCNLP 2021: System Demonstrations, pp. 122-131, 2021.
[PDF] [Code] [Project]
Out-of-the-Box and Into the Ditch? Multilingual Evaluation of Generic Text Extraction Tools
Adrien Barbaresi, Gaël Lejeune
Language Resources and Evaluation Conference (LREC 2020), Proceedings of the 12th Web as Corpus Workshop (WAC-XII), pp. 5-13, 2020.
[PDF] [Code] [Project]
A corpus of German political speeches from the 21st century
Adrien Barbaresi
11th Language Resources and Evaluation Conference (LREC 2018), pp. 792-797, 2018.
[PDF] [Project]
A Constellation and a Rhizome: Two Studies on Toponyms in Literary Texts
Adrien Barbaresi
Visualisierung sprachlicher Daten: Visual Linguistics – Praxis – Tools, N. Bubenhofer & M. Kupietz (eds.), Heidelberg University Publishing, pp. 167-184, 2018.
[PDF]
Powered by Jekyll and Minimal Light theme.