Two studies on toponyms in literary texts


Because it is impossible for individuals to “read” everything in a large corpus, advocates of distant reading employ computational techniques to “mine” the texts for significant patterns and then use statistical analysis to make statements about those patterns (Wulfman 2014).

Although the attention of linguists is commonly drawn to forms other than proper nouns, the significance of place names in particular exceeds the usual frame of deictic and indexical functions, as they encapsulate more than a mere reference in space. In a recent publication, I present two studies that center on the visualization of place names in literary texts written in German, with particular emphasis on the concept of visualization, that is on the processes and not on the products (Crampton 2001). I discuss research on toponym extraction and linkage from an interdisciplinary perspective and address questions related to research theory and practice.


The first case consists of a preliminary study of travel literature based on Richthofen’s Travel Journals from China (Tagebücher aus China, 1907) and relies on manually annotated data. The resulting map retraces the path taken by the author in the Shandong province by combining coordinates, sequences, and a sense of time. In order to …

more ...

Franco-German workshop series on the historical illustrated press

I wrote a blog post on the Franco-German conference and workshop series I am co-organizing with Claire Aslangul (University Paris-Sorbonne) and Bérénice Zunino (University of Franche-Comté). The three events planned revolve around the same topic: the illustrated press in France and Germany from the end of the 19th to the middle of the 20th century, drawing from disciplinary fields as diverse as visual history and computational linguistics. A first workshop will take place in Besançon in April, then a larger conference will be hosted by the Maison Heinrich Heine in Paris at the end of 2018, and finally a workshop focusing on methodological issues will take place at the Berlin-Breandenburg Academy of Sciences next year in autumn.

For more information, see this description in German.

more ...

My contribution to the Anglicism of the Year award

I contributed to the Anglicism of the Year award nominations. It is the second edition, the first was rather confidential but still got mentionned by the English-speaking press (e.g. by The Guardian).

The jury is once again chaired by Anatol Stefanowitsch, a professor in linguistics at Hamburg University. The selection of the final nominees will be relayed by a few German bloggers specialized in linguistics. I made it to the first nominees, but there was no selection so far, this phase goes till January 7th. News can be found on the official blog.

My suggestions are:

  • das Handyticketsystem
  • whistleblowen
  • der Occupist, die Occupisten
  • die Post-Privacy

To my opinion, the latter two have the good chances to advance to the final stage. Among the other nominees I like die Fazialpalmierung (facepalm) and die Liquid Democracy. But there are not that many interesting ones, that may be a reason why the deadline was postponed by a week.


more ...

Having fun and making money doing research

What do people look for ? A few years ago it would have been difficult to gather information at a large scale and grab it with a powerful, yet more or less objective tool. Nowadays a single company is able to know what you want, what you buy or what you just did. And sometimes it shares a little bit of the data.

So, the end of the year gives me an occasion to try and discover changes in the mentalities using the ready-to-use Google Trends. Just for fun…

How does research compare with other interests ?

First of all, research is no fun, it was more requested than money and was at the level of work, but things have changed. It still outnumbers fun in the news though.

A few trends regarding research

A few trends regarding research, “Research is no fun”… Source: Google), worldwide trends.

People seem to look for money more often than a few years ago, it’s the only thing which becomes more popular, even work just remains stable.

A remark: I think the search volume is much more bigger now than it was back in 2004, there are also more languages available, and probably more search terms (since the users may …

more ...

Using and parsing the hCard microformat, an introduction

Recently, as I decided to get involved in the design of my personal page, I learned how to represent semantic markup on a web page. I would like to share a few things about writing and parsing semantic information in this format. I have the intuition that it is only the beginning and that there will be more and more formats to describe who you are, what do you do, who your are related to, where you link to, and engines that gather these informations.

First of all, the hCard microformat points to this standard, hCard 1.0.1.  For an explanation of what it is, see here on, for a global article on microformats see also Wikipedia.

The information displayed is useful as it is a way to markup semantic relations, so that named entities are correctly identified. By search engines for instance : Google supports several formats, including hCard, and there are more specific search engines which aim at gathering informations such as a contact or a product list starting from this kind of markup. For a comprehensive list see here.

Now, if you are interested in parsing microformats, there are several tools. Among them, my pick …

more ...

A short bibliography on Latent Semantic Analysis and Indexing

To go a bit further than my previous post, here are a few references that I recently found to be interesting.

For a definition and/or other short bibliographies, see Wikipedia or something else this time : Scholarpedia, with an article “curated” by T.K. Landauer and S.T. Dumais.

U. Mortensen, Einführung in die Korrespondenzanalyse, Universität Münster,2009.

G. Gorrell and B. Webb, “Generalized Hebbian Algorithm for Incremental Latent Semantic Analysis,” in Ninth European Conference on Speech Communication and Technology, 2005.

P. Cibois, Les méthodes d’analyse d’enquêtes, Que sais-je ?, 2004.

B. Pincombe, Comparison of Human and Latent Semantic Analysis (LSA) Judgements of Pairwise Document Similarities for a News Corpus, Australian Department of Defence,2004.

M. W. Berry, S. T. Dumais, and G. W. O’Brien, “Using Linear Algebra for Intelligent Information Retrieval,” SIAM Review, vol. 37, iss. 4, p. pp. 573-595, 1995.

S. Dumais, Enhancing performance in latent semantic indexing (LSI) retrieval, Bellcore,1992.

S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by latent semantic analysis”, Journal of the American society for information science, vol. 41, iss. 6, pp. 391-407, 1990.

G. Salton, A. Wong, and C. S. Yang, “A vector …

more ...

Why I don’t blog on and why I might do so (someday…)

People around me at the lab keep talking about a French institutional blog platform named In fact it is well-known but no one is using it. The website is still a bit new, according to them they currently host a hundred blogs.

The main benefits are visibility and durability as it is institutional, well-referenced and competently maintained.

It is what it claims to be, which is also why I hesitated and finally chose to set up a basic personal website.

  • First you need to fill out a form to get a registration, which is good in terms of label, but I don’t know how long or how often I am going to blog. I don’t want to request a service I might finally not use.
  • The second reason is that it is very useful for people who do not want to deal with layout issues, all the pages look quite the same apart from backgrounds colors and a few images. I think it may be to maintain a global coherence on the website.
  • It’s not that international, it’s not what it’s meant to be. Most of the articles are in French, and I …
more ...