Comparison of Features for Automatic Readability Assessment: review

I read an interesting article, “featuring” an up-to-date comparison of what is being done in the field of readability assessment:

A Comparison of Features for Automatic Readability Assessment”, Lijun Feng, Martin Jansche, Matt Huenerfauth, Noémie Elhadad, 23rd International Conference on Computational Linguistics (COLING 2010), Poster Volume, pp. 276-284.

I am interested in the features they use. Let’s summarize, I am going to do a quick recension:

Corpus and tools

  • Corpus: a sample from the Weekly Reader
  • OpenNLP to extract named entities and resolve co-references
  • the Weka learning toolkit for machine learning

Features

  • Four subsets of discourse features:
  • 1. entity-density …
more ...