Continuing a series of reviews on readability assessment, I would like to describe a tool which is close to what I intend to do. It is named DeLite and is named a ‘readability checker’. It has been developed at the IICS research center of the FernUniversität Hagen.
From my point of view, its main feature is that it has not been made publicly available, it is based on software one has to buy and I did not manage to find even a demo version, although they claim to have been publicly (i.e. EU-)funded. Thus, my description is based on what its designers mention in the articles quoted below.
The article by Glöckner et al. (2006) offers a description of the fundamentals of the software, as well as an interesting summary of research on readability. They depict the ‘classical’ pattern used to come to a readability formula :
- ‘select elements in a text that are related to readability’,
- then ‘correlate element occurrences with text readability (measured by established comprehension tests)’,
- and finally ‘combine the variables into a regression equation’ (p. 32).
This is the approach that led to a preponderance of criteria like word and sentence length, because they correlated better than others with text readability. Nonetheless, they are not sufficient and they are easy to manipulate. As the authors say :
‘Simple word and sentence variables are at best indications but not causal factors of semantic and syntactic difficulty.’ (p. 33)
That is why they try to find more detectable indicators by using a broad approach which involves many natural language processing tools, from lexical analysis to semantic parsing. The ones they list were developed at their research center, starting in 2001: a syntactico-semantic parser, a large semantic lexicon for German, a MultiNet knowledge representation formalism and a coreference resolution at the text level. To my knowledge, none of them have been made freely available.
A note on the vocabulary used: the designers of DeLite mention the existence of a ‘hierarchical annotation structure of linguistic units’ without being more precise, they probably use the XML syntax. They use both terms of readability and comprehensibility. They discriminate between the words ‘criteria’ and ‘indicators’:
‘For methodological reasons, we distinguish between readability criteria and readability indicators, where the purpose of latter is to operationalize the evaluation of the former.’ (p. 34)
Aggregation of indicators and evaluation procedures
The technical report issued in 2008 is much more precise and undertakes a description of all indicators.
Among the interesting features that are not to be seen often in research, there are :
- the compound complexity (on the morphological level),
- the naming consistency (in the form of synonymy relations),
- the linear precedence complexity (i.e. the distance between the verb and its complements),
- the semantic complexity and the coreference ambiguity.
The article published in the proceedings of the IS 2008 also mentions ‘deep’ syntactic complexity indicators obtained by text parsing, such as the center embedding depth of a main verb.
As the text is a technical report, the authors quote the features in their software that correspond to these indicators, but it is not always clear how they manage to detect them. Moreover, a few indicators overlap, like for instance the number of dependents per verb, the number of dependents per noun phrase and the number of constituents per coordination, which are said to account for syntactic complexity : these phenomena are linked. Even if a statistical analysis was made, there is no evidence that the interdependence of the indicators was properly monitored.
The indicators were normalized regarding their probability distribution and they were combined using machine learning algorithms. Principal component analysis was used for the second step, indicators with negative weights were removed as two regression methods were investigated, an exact robust regression method and an approximative method based on linear regression (technical report p. 21). There is no mention of the impact of this reduction.
The indicators (i.e. the ones that still correlate with the evaluation methods) are aggregated in a weighted formula according to their relevance (see the evaluation procedures below). Here are the first five indicators (p. 29):
- the quality of the semantic network,
- the inverse lemma frequency,
- the average sentence length,
- the average distance between verb and prefix,
- the number of syllables.
Thus, the output of the semantic parser seems to be directly usable. From a syntactic point of view, the size of the Satzklammer could be a particular feature of German that has to be taken into account.
The authors use the Amstad Readability Index (a German variant of the Flesch Reading Ease Score) as a baseline to compare their results. They also realized a study where participants rated texts on a seven point scale. The texts used in the study are said to come from the ‘municipal domain’, it is hard to figure out their content. They alledgedly contain ‘a lot of ordinances with legal terms and abbreviations’.
The five indicators which correlate the most with the user’s ratings are:
- the number of words per sentence,
- the semantic network quality,
- the inverse concept frequency,
- the word form frequency,
- the number of reference candidates for a pronoun.
So, the coreference analysis and the semantic parsing are two important contributions to readability assessment. As they also are among the most complex indicators, it would be interesting to see how they are measured, and how much of a bias the tools induce.
A glimpse of a final version of the software is available on the website of the project.
It enables to see which choices were made in order to get a clean representation of the results. On the top-right corner, the readability score for the given text is indicated, using a five stars scale and a percentage. A bar chart could be more relevant in order to get a better image of the results. That is also the problem of the indicators listed at the center-left, which are given ‘as is’, for instance ‘type-token ratio : 0.94’. Without a proper scale it is hard to interpret them.
The possible problems found are stated on the right according to a typology (morphological, lexical, syntactic, semantic or discourse level). It is useful when it comes to analyze single sentences or paragraphs, but it is not clear whether the interface was designed for whole texts or even text collections. It is definitely not be practical of one has to click through each part of the text to see its characteristics.
The approach described here is clearly a global one : its purpose is to give a rating for the whole input text. Evaluation and visualization were also made at a global level, although one could also take advantage of the mentioned tools locally : through the combination of indicators in a formula, there is a loss of granularity compared to the raw output of the syntactic and semantic parsers.
As a conclusion, I think this approach is well-grounded and really interesting, but it is seriously undermined by the unavailability of both software and test corpora, as other researchers could benefit greatly from these tools, for instance by integrating them or using them in a benchmark.
- T. vor der Brück, S. Hartrumpf, and H. Helbig, “A Readability Checker with Supervised Learning using Deep Syntactic and Semantic Indicators,” in Proceedings of the 11th International Multiconference: Information Society – IS 2008 – Language Technologies, Ljubljana, Slovenia, 2008, pp. 92-97.
- T. vor der Brück, H. Helbig, and J. Leveling, “The Readability Checker DeLite, Technical Report”, FernUniversität in Hagen, Intelligente Informations- und Kommunikationssysteme, 2008.
- I. Glöckner, S. Hartrumpf, H. Helbig, J. Leveling, and R. Osswald, “An architecture for rating and controlling text readability”, Proceedings of KONVENS 2006, pp. 32-35, 2006.
A list of all publications related to this project is available on the IICS website.