On the creation and use of social media resources

Emoji analysis”

The necessity to study language use in computer-mediated communication (CMC) appears to be of common interest, as online communication is ubiquitous and raises a series of ethical, sociological, technological and technoscientific issues among the general public. The importance of linguistic studies on CMC is acknowledged beyond the researcher community, for example in forensic science, as evidence can be found online and traced back to its author. In a South Park episode (“Fort Collins”, episode 6 season 20), a school girl performs “emoji analysis” to get information on the author of troll messages. Using the distribution of emojis, she concludes that this person cannot be the suspected primary school student but has to be an adult. Although the background story seems somehow far-fatched, as often with South Park, the logic of the analysis is sound.

General impressions on research trends

I recently went to a workshop on computer-mediated communication and social media. I am impressed by the preponderant role of Twitter data, in the focus of a significant number of researchers. This is a open field, with still much to do research on: there seems to be no clear or widely acknowledged methodology and there are diverging approaches concerning …

more ...

Distant reading and text visualization

A new paradigm in “digital humanities” – you know, that Silicon Valley of textual studies geared towards neoliberal narrowing of research (highly provocative but interesting read nonetheless)… A new paradigm resides in the belief that understanding language (e.g. literature) is not accomplished by studying individual texts, but by aggregating and analyzing massive amounts of data (Jockers 2013). Because it is impossible for individuals to “read” everything in a large corpus, advocates of distant reading employ computational techniques to “mine” the texts for significant patterns and then use statistical analysis to make statements about those patterns (Wulfman 2014).

One of the first attempts to apply visualization techniques to texts has been the “shape of Shakespeare” by Rohrer (1998). Clustering methods were used to let set emerge among textual data as well as metadata, not only in humanities but also in the case of Web genres (Bretan, Dewe, Hallberg, Wolkert, & Karlgren, 1998). It may seem rudimentary by today’s standards or far from being a sophisticated “view” on literature but the “distant reading” approach is precisely about seeing the texts in another perspective and exploring the corpus interactively. Other examples of text mining approaches enriching visualization techniques include the document atlas of …

more ...

Foucault and the spatial turn

I would like to share a crucial text by Michel Foucault which I discovered through a recent article by Marko Juvan on geographical information systems (GIS) and literary analysis:

  • Juvan, Marko (2015). From Spatial Turn to GIS-Mapping of Literary Cultures. European Review, 23(1), pp. 81-96.
  • Foucault, Michel (1984). Des espaces autres. Hétérotopies. Architecture, Mouvement, Continuité, 5, pp. 46-49. Originally: Conférence au Cercle d’études architecturales, 14 mars 1967.

The full text including the translation I am quoting from is available on foucault.info. It is available somewhere in Dits et écrits in paper form. If am understand correctly, the translation is from Jay Miskowiec (see this website). It is an absolute bootleg, since it is originally from a lecture and has not been officially planned for publication. Still, Foucault’s prose is as usual really dense and there is much to learn from it. In the course of time, it has become a central text of the so-called “spatial turn”, which has admittedly been introduced by Foucault and Lefebvre in the 1960s and 70s.

In the opening of the text, comparing the 20th with the 19th century, Foucault comes to the idea that our time is one of …

more ...


Here is the beginning of a bibliography generated from my Master’s thesis, converted between different formats, and parked here for further reference.

Complexity and Readability Assessment


Complexity and Linguistic Complexity Theory

  • S. T. Piantadosi, H. Tily, and E. Gibson, “Word lengths are optimized for efficient communication”, Proceedings of the National Academy of Sciences, vol. 108, iss. 9, pp. 3526-3529, 2011.
  • L. Maurits, A. Perfors, and D. Navarro, “Why are some word orders more common than others? A uniform information density account”, in Proceedings of NIPS, 2010.
  • P. Blache, “Un modèle de caractérisation de la complexité syntaxique”, in TALN 2010, Montréal, 2010.
  • T. Givon, The Genesis of Syntactic Complexity : diachrony, ontogeny, neuro-cognition, evolution, Amsterdam, New York: John Benjamins Publishing Co., 2009.
  • M. Mitchell, Complexity: A Guided Tour, Oxford, New York: Oxford University Press, 2009.
  • C. Beckner, N. C. Ellis, R. Blythe, J. Holland, J. Bybee, J. Ke, M. H. Christiansen, D. Larsen-Freeman, W. Croft, and T. Schoenemann, “Language Is a Complex Adaptive System …
more ...

Resources and links of interest

Archive of links gathered during my PhD thesis:

  1. Linguistics and NLP
  2. Corpus Linguistics
  3. Perl
  4. LaTeX
  5. R
  6. PhD related
  7. Misc.

Update: Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German

1 – Linguistics and NLP

General Linguistics

Computational Linguistics

Online Articles and Conferences

Lists of CL Blogs

Resources for German

more ...