Adrien Barbaresi | Research, Engineering,<br/>Data Science

About me

Data engineer and scientist specializing in Natural Language Processing, providing solutions in data acquisition, information processing and visualization.

10+ years of experience bridging humanities and computer science, with extensive knowledge of language data and NLP pipelines.

Familiar with quantitative methods, machine learning and artificial intelligence, coding and teaching. Special interest in contributing to leading open source software.

Author and project leader of Trafilatura, an open-source package to gather and extract text data used by researchers and the AI, LLM and RAG industry.

For more see Software on Github and Research Blog.

Services

Reviews for conferences (notably ACL, CMC-Corpora, Computational Humanities Research, Digital Humanities, EACL, EMNLP, KONVENS, SwissText; research projects (ESF, FWO); volume chapters (e.g. proofreader profile for Language Science Press; journals (Journal of Open Humanities Data, Language Resources and Evaluation); and workshops (CPSS, SOCAI)
Organization of conferences (KONVENS 2018) and workshops, e.g. Challenges in the Management of Large Corpora (CMLC)) & 12th Web as Corpus Workshop (WAC-XII)
Editor (2017-2021) of the Journal for Language Technology and Computational Linguistics (JLCL) and member of the executive board of the German Society for Computational Linguistics & Language Technology (GSCL)
Director (2011-2013) of ENthèSe (association of doctoral candidates)

Teaching

Guest lecturer at Zhejiang University (浙大) (Hangzhou, China) since 2016. Classes on methodological and practical aspects of corpus linguistics, text analysis and visualization (School of international studies / 外语学院)
Master level classes and tutoring at the École Normale Supérieure de Lyon (2011-2013): Collaborative work and language teaching, (Open) Data collection and visualization, Web design with CSS/XHTML for beginners, Introduction to and Advanced LaTeX
Associate lecturer at the University of Freiburg (Germany) (2005-2006): translation (German to French) and text analysis (French texts) on Bachelor and Master level.

For more information see the archives or my presentations on SlideShare.

Notable Publications

Education

Dr. phil. in Linguistics: Ad hoc and general-purpose corpus construction from web sources (École Normale Supérieure de Lyon, 2015).
Thesis committee: Benoît Habert (advisor), Thomas Lebarbé (chair), Henning Lobin (reviewer), Jean-Philippe Magué (co-advisor), Ludovic Tanguy (reviewer).

Projects

CLARIN-D, German Text Archive (DTA), DWDS and ZDL (Center for Digital Lexicography of German) projects at the Berlin-Brandenburg Academy of Sciences
Research associate at the Austrian Academy of Sciences (Academy Corpora group)
Corpora from the Web (COW) at the Free University of Berlin
Linguistics and Instrumented Text Databases team at ICAR lab (ENS Lyon)