Comparison of Features for Automatic Readability Assessment: review
I read an interesting article, “featuring” an up-to-date comparison of what is being done in the field of readability assessment:
“A Comparison of Features for Automatic Readability Assessment”, Lijun Feng, Martin Jansche, Matt Huenerfauth, Noémie Elhadad, 23rd International Conference on Computational Linguistics (COLING 2010), Poster Volume, pp. 276-284.
I am interested in the features they use. Let’s summarize, I am going to do a quick recension:
Corpus and tools
- Corpus: a sample from the Weekly Reader
- OpenNLP to extract named entities and resolve co-references
- the Weka learning toolkit for machine learning
Features
- Four subsets of discourse features: 1. entity-density …