Franco-German workshop series on the historical illustrated press

I wrote a blog post on the Franco-German conference and workshop series I am co-organizing with Claire Aslangul (University Paris-Sorbonne) and Bérénice Zunino (University of Franche-Comté). The three events planned revolve around the same topic: the illustrated press in France and Germany from the end of the 19th to the middle of the 20th century, drawing from disciplinary fields as diverse as visual history and computational linguistics. A first workshop will take place in Besançon in April, then a larger conference will be hosted by the Maison Heinrich Heine in Paris at the end of 2018, and finally a workshop …

more ...

On the creation and use of social media resources

Reflexions after a workshop on computer-mediated communication and social media: Besides the consensus on tweet IDs as exchange currency for replication studies, open questions remain concerning data re-use for existing linguistic archives

more ...

On the interest of social media corpora

Introduction

The necessity to study language use in computer-mediated communication (CMC) appears to be of common interest, as online communication is ubiquitous and raises a series of ethical, sociological, technological and technoscientific issues among the general public. The importance of linguistic studies on CMC is acknowledged beyond the researcher community, for example in forensic analysis, since evidence can be found online and traced back to its author.

In a South Park episode (“Fort Collins”, episode 6 season 20), a school girl performs “emoji analysis” to get information on the author of troll messages. Using the distribution of emojis, she concludes …

more ...

Finding viable seed URLs for web corpora

I recently attended the Web as Corpus Workshop in Gothenburg, where I had a talk for a paper of mine, Finding viable seed URLs for web corpora: a scouting approach and comparative study of available sources, and another with Felix Bildhauer and Roland Schäfer, Focused Web Corpus Crawling.

Summary

The comparison I did started from web crawling experiments I performed at the FU Berlin. The fact is that the conventional tools of the “Web as Corpus” framework rely heavily on URLs obtained from search engines. URLs were easily gathered that way until search engine companies restricted this allowance, meaning that …

more ...

A few links on producing posters using LaTeX

As I had to make a poster for the TALN 2011 conference to illustrate my short paper (PDF, in French), I decided to use LaTeX, even if it was not the easiest way. I am quite happy with the result (PDF).

I gathered a few links that helped me out. My impression is that there are two common models, and as I matter of fact I saw both of them at the conference. The one that I used, Beamerposter, was “made in Germany” by Philippe Dreuw, from the Informatics Department of the University of Aachen. I only had to adapt …

more ...

Workshop on Complexity in Language – Day 2 (report)

I could not follow the whole second day of the Workshop on Complexity in Language (see previous post), but here is what I heard in the morning.

Salikoko Mufwene talked about the emergence of complexity, which he sees as a self-organization process : we don’t plan the way we are going to speak.

He adopts a relativistic perspective speaking of a multi-agent system and asking if the agents are really agentive or if there are triggers of particular behaviors. He likes to consider language as a technology that evolved. At the end of the talk he also tackled the notion …

more ...

Workshop on Complexity in Language - Day 1 (report)

I attended yesterday the first day of a workshop organized by Salikoko Mufwene and held at the ENS Lyon. This “Workshop on Complexity in Language: Developmental and Evolutionary Perspectives” lasts two days: HTML version of the program.

Here is my personal report on what I heard during the first day and on what I found interesting.

Complexity and complexity science

First of all, William S.-Y. Wang referred to Herbert Simon and Melanie Mitchell in particular to define complexity, two approaches that I described on this blog.

Tom Schoenemann talked about the increasing richness, subtlety and complexity of hominin conceptual …

more ...

On Text Linguistics

Talking about text complexity in my last post, I did not realize how important it is to take the framework of text linguistics into account. This branch of linguistics is well-known in Germany but is not really meant as a topic by itself elsewhere. Most of the time, no one makes a distinction between text linguistics and discourse analysis, although the background is not necessarily the same.

I saw a presentation by Jean-Michel Adam last week, who describes himself as the “last of the Mohicans” to use this framework in French research. He drew a comprehensive picture of its origin …

more ...