Data engineer and scientist specializing in Natural Language Processing, providing solutions in data acquisition, information processing and visualization.
10+ years of experience bridging humanities and computer science, with extensive knowledge of language data and NLP pipelines.
Familiar with quantitative methods, machine learning and artificial intelligence, coding and teaching. Special interest in contributing to leading open source software.
Author and project leader of Trafilatura, an open-source package to gather and extract text data used by researchers and the AI, LLM and RAG industry.
For more see Software on Github and Research Blog.
For more information see the archives or my presentations on SlideShare.
See also my profile on Google Scholar.
Trafilatura: A Web Scraping Library and Command-Line Tool for Text Discovery and Extraction
Adrien Barbaresi
Proceedings of ACL/IJCNLP 2021: System Demonstrations, pp. 122-131, 2021.
[PDF] [Code] [Project]
A corpus of German political speeches from the 21st century
Adrien Barbaresi
11th Language Resources and Evaluation Conference (LREC 2018), pp. 792-797, 2018.
[PDF] [Project]
A Constellation and a Rhizome: Two Studies on Toponyms in Literary Texts
Adrien Barbaresi
Visualisierung sprachlicher Daten: Visual Linguistics – Praxis – Tools, N. Bubenhofer & M. Kupietz (eds.), Heidelberg University Publishing, pp. 167-184, 2018.
[PDF]
Powered by Jekyll and Minimal Light theme.