I have selected a few papers on readability published in the last years, all available online (for instance using a specialized search engine, see previous post):
- First of all, I reviewed this one last week, it is a very up-to-date article. L. Feng, M. Jansche, M. Huenerfauth, and N. Elhadad, “A Comparison of Features for Automatic Readability Assessment”, 2010, pp. 276-284.
- The seminal paper to which Feng et al. often refers, as they combine several approaches, especially statistical language models, support vector machines and more traditional criteria. A comprehensive bibliography. S. E. Schwarm and M. Ostendorf, “Reading level assessment using support vector machines and statistical language models”, in Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 2005, pp. 523-530.
- A complementary approach, also a combination of features, this time mainly of lexical and grammatical ones, with a focus on the latter, as the authors use parse trees and subtrees (i.e. «relative frequencies of partial syntactic derivations») at three different levels. I found this convincing. A comparison of three statistical models: Linear Regression, Proportional Odds Model and Multi-class Logistic Regression. M. Heilman, K. Collins-Thompson, and M. Eskenazi, “An analysis of statistical models and features for reading difficulty ...