The necessity to study language use in computer-mediated communication (CMC) appears to be of common interest, as online communication is ubiquitous and raises a series of ethical, sociological, technological and technoscientific issues among the general public. The importance of linguistic studies on CMC is acknowledged beyond the researcher community, for example in forensic science, as evidence can be found online and traced back to its author. In a South Park episode (“Fort Collins”, episode 6 season 20), a school girl performs “emoji analysis” to get information on the author of troll messages. Using the distribution of emojis, she concludes that this person cannot be the suspected primary school student but has to be an adult. Although the background story seems somehow far-fatched, as often with South Park, the logic of the analysis is sound.
General impressions on research trends
I recently went to a workshop on computer-mediated communication and social media. I am impressed by the preponderant role of Twitter data, in the focus of a significant number of researchers. This is a open field, with still much to do research on: there seems to be no clear or widely acknowledged methodology and there are diverging approaches concerning data. Besides the apparent consensus regarding tweet IDs as “exchange currency” for scientific cooperation and replication studies, open questions of data reuse for existing “wild” archives (e.g. the archive.org “Twitter Stream Grab”) or derivates such as linguistically annotated data.
In any case, gathering CMC data in one place and making it accessible on a massive scale to scientific apparatuses (for example indexing or user-related metadata) understandably raises concerns related to the human lives and interactions which are captured by, hidden in, or which enfold beyond the data. The debate among the research community is all the more necessary since corporations whose business model resides in the ongoing collection and exploitation of social data are not likely to voice concerns about it: Facebook for example doesn’t like, and doesn’t use, the term “shadow profiles”, although such data aggregates very much exist.
I wrote two summaries of the workshop: