About me
Data engineer and scientist specializing in Natural Language Processing.
Author and project leader of Trafilatura, an open-source package to gather and extract text data used by researchers and the AI industry.
Further topics of interest: methods and data for NLP, digital humanities, open source software.
For more see Research Blog and Open Source Software on Github
Services
- Reviews for conferences (notably ACL, CMC-Corpora, Computational Humanities Research, Digital Humanities, EACL, EMNLP, KONVENS, SwissText; research projects (ESF, FWO); volume chapters (e.g. proofreader profile for Language Science Press; journals (Journal of Open Humanities Data, Language Resources and Evaluation); and workshops (CPSS, SOCAI)
- Organization of conferences (KONVENS 2018) and workshops, e.g. Challenges in the Management of Large Corpora (CMLC)) & 12th Web as Corpus Workshop (WAC-XII)
- Editor (2017-2021) of the Journal for Language Technology and Computational Linguistics (JLCL) and member of the executive board of the German Society for Computational Linguistics & Language Technology (GSCL)
- Director (2011-2013) of ENthèSe (association of doctoral candidates)
Teaching
For more information see the archives or my presentations on SlideShare.
Selected Publications
See also my profile on Google Scholar.
-
Trafilatura: A Web Scraping Library and Command-Line Tool for Text Discovery and Extraction
Adrien Barbaresi
Proceedings of ACL/IJCNLP 2021: System Demonstrations, pp. 122-131, 2021.
[PDF] [Code] [Project]
-
A corpus of German political speeches from the 21st century
Adrien Barbaresi
11th Language Resources and Evaluation Conference (LREC 2018), pp. 792-797, 2018.
[PDF] [Project]
-
A Constellation and a Rhizome: Two Studies on Toponyms in Literary Texts
Adrien Barbaresi
Visualisierung sprachlicher Daten: Visual Linguistics – Praxis – Tools, N. Bubenhofer & M. Kupietz (eds.), Heidelberg University Publishing, pp. 167-184, 2018.
[PDF]
Education
Past Projects
- CLARIN-D, German Text Archive (DTA), DWDS and ZDL (Center for Digital Lexicography of German) projects at the Berlin-Brandenburg Academy of Sciences
- Research associate at the Austrian Academy of Sciences (Academy Corpora group)
- Corpora from the Web (COW) at the Free University of Berlin
- Linguistics and Instrumented Text Databases team at ICAR lab (ENS Lyon)
Powered by Jekyll and Minimal Light theme.