Distant reading and text visualization

A new paradigm in “digital humanities” – you know, that Silicon Valley of textual studies geared towards neoliberal narrowing of research (highly provocative but interesting read nonetheless)… A new paradigm resides in the belief that understanding language (e.g. literature) is not accomplished by studying individual texts, but by aggregating and analyzing massive amounts of data (Jockers 2013). Because it is impossible for individuals to “read” everything in a large corpus, advocates of distant reading employ computational techniques to “mine” the texts for significant patterns and then use statistical analysis to make statements about those patterns (Wulfman 2014).

One of the first attempts to apply visualization techniques to texts has been the “shape of Shakespeare” by Rohrer (1998). Clustering methods were used to let set emerge among textual data as well as metadata, not only in humanities but also in the case of Web genres (Bretan, Dewe, Hallberg, Wolkert, & Karlgren, 1998). It may seem rudimentary by today’s standards or far from being a sophisticated “view” on literature but the “distant reading” approach is precisely about seeing the texts in another perspective and exploring the corpus interactively. Other examples of text mining approaches enriching visualization techniques include the document atlas of Fortuna, Grobelnik, and Mladenic (2005), and the parallel tag clouds of Collins et al. (2009).

The criticism concerning culturomics seems to hold true for corpus visualization as well: there is still a gap to bridge between information visualization, NLP and digital humanities. The exploration of digital text collections obtains better results and reaches a larger user base if work on visualization is conducted in a dialog between philologists and NLP experts.

One may consider that basic visualization techniques are already used in corpus linguistics, since concordancers, collocation networks, or key word clouds are ways to see through a corpus (Rayson & Mariani 2009). Nonetheless, there is still a lot of work to do to catch up on the computer science field of information visualization, as Rayson & Mariani (2009) acknowledge:

“We wish to allow linguists to explore their data in ‘strange’ new ways and to seek out new patterns and new visualisations”.

In fact, exploration is a frequently used keyword when it comes to make corpus content available through a visualization, which cannot be reduced to a mere statistical analysis but grounds on more complex processes:

“Statistical tools alone are not sufficient for ‘distant reading’ analysis: methods to aid in the analysis and exploration of the results of automated text processing are needed, and visualization is one approach that may help.” (Collins, Viegas & Wattenberg 2009)

Imagination is another keyword, especially concerning digital humanities and arts, as there are both a need and a real potential concerning graphic aids for digital humanists, with text objects being at a crossing between quantitative methods and aesthetics:

“A great many of the visualization methods applied to text are derived from analytical quantitative methods that were originally borrowed from the sciences. This is an interesting area of application because there are also other more imaginative visualization tools that owe more to the arts than the sciences” (Jessop 2008)

In that sense, cooperation between different disciplines not only allows for a revision of methodologies, it may also pave the way to more creative approches – as long as critical reflexion is present.

References

Bretan, I., Dewe, J., Hallberg, A., Wolkert, N., & Karlgren, J. (1998). “Web-Specific Genre Visualization”. In Webnet.
Collins, C., Viegas, F. B., & Wattenberg, M. (2009). “Parallel tag clouds to explore and analyze faceted text corpora”. In Visual analytics science and technology (pp. 91–98).
Fortuna, B., Grobelnik, M., & Mladenic, D. (2005). “Visualization of text document corpus”. Informatica, 29(4).
Jessop, M. (2008). “Digital visualization as a scholarly activity”. Literary and Linguistic Computing, 23 (3), 281–293.
Jockers, M. L.(2013). “Macroanalysis: Digital methods and literary history”. University of Illinois Press.
Rayson, P., & Mariani, J. (2009). “Visualising corpus linguistics”. In Proceedings of the Corpus Linguistics conference.
Rohrer, R. M., Ebert, D. S., & Sibert, J. L.(1998). “The shape of Shakespeare: Visualizing text using implicit surfaces”. In Proceedings of the IEEE symposium on information visualization (pp. 121–129).
Wulfman, C. E.(2014). “The Plot of the Plot: Graphs and Visualizations”. The Journal of Modern Periodical Studies, 5(1), 94-109.

References

Related Posts: