In his professorial thesis (or habilitation thesis), which is about to be made public (the defence takes place next week), Ludovic Tanguy explains why and on what conditions data visualization could help linguists. In a previous post, I showed a few examples of visualization applied to the field of readability assessment. Tanguy’s questioning is more general, it has to do with what is to include in the disciplinary field of linguistics.

He gives a few reasons to use the methods from the emerging field of visual analytics and mentions some of its upholders (like Daniel Keim or Jean-Daniel Fekete). But he also states that they are not well adapted to the prevailing models of scientific evaluation.

Why use visual analytics in linguistics ?

His main point is the (fast) growing size and complexity of linguistic data. Visualization comes at hand when selecting, listing or counting phenomena does not prove useful anymore. There is evidence from the field of cognitive psychology that an approach based on form recognition may lead to an interpretation. Briefly, new needs come forth when calculations come short.

Tanguy gives to main examples of cases where it is obvious : firstly the analysis of networks, which can be linguistically relevant i.e. for dependency relations within a sentence or a text, and secondly the multiple characteristics conferred to individual data, say the multiple layers of annotation.

He sees three main goals in data analysis that may be reached using visualizations:

  • to construct a global point of view (like an aerial view)
  • to look for configurations
  • to cross data of different nature (one could also say on a different scale)

What is still do to if this method is to be adopted ?

Nonetheless, the notion of visualization by itself is not a solution to a given problem, one has to find the most adapted processes, which in turn are a construct, a limited projection of the complexity of data.

Thus, it is important to leave the users room for experiment (and try and fail). A few valuable insights may only appear if visualization parameters are allowed to vary. Tanguy suggests three kinds of evolutions:

  • the selection of the dimensions to display and their mode of representation
  • a whole series of operations on the constructed view
  • last, a fine-tuning of both visualization and data

Tanguy quotes Ben Shneiderman‘s mantra: ‘Overview first, zoom and filter, then details-on-demand’.

The last problem may lie in the complexity of the visualization tools. Tanguy sees three main abilities to deal with this matter (always three subcomponents, an interesting twist typical of French academic culture):

  • deep as well as fine-grained knowledge of the analyzed data
  • experience with the visualization processes
  • competence in data analysis