About Google Reading Level

Jean-Philippe Magué told me there was a Google advanced search filter that checked the result pages to give a readability estimate. In fact, it was introduced about seven months ago and works to my knowledge only for the English language (that’s also why I didn’t notice it).

Description

For more information, you can read the official help page. I also found two convincing blog posts showing how it works, one by the Unofficial Google System Blog and the other by Daniel M. Russell.

The most interesting bits of information I was able to find consist in a brief explanation by a product manager at Google who created the following topic on the help forum : New Feature: Filter your results by reading level.
Note that this does not seem to have ever been a hot topic !

Apparently, it was designed as an “annotation” based on a statistical model developed using real word data (i.e. pages that were “manually” classified by teachers). The engine works by performing a word comparison, using the model as well as articles found by Google Scholar.

In the original text :

The feature is based primarily on statistical models we built with the help of …

more ...

Using and parsing the hCard microformat, an introduction

Recently, as I decided to get involved in the design of my personal page, I learned how to represent semantic markup on a web page. I would like to share a few things about writing and parsing semantic information in this format. I have the intuition that it is only the beginning and that there will be more and more formats to describe who you are, what do you do, who your are related to, where you link to, and engines that gather these informations.

First of all, the hCard microformat points to this standard, hCard 1.0.1.  For an explanation of what it is, see here on microformats.org, for a global article on microformats see also Wikipedia.

The information displayed is useful as it is a way to markup semantic relations, so that named entities are correctly identified. By search engines for instance : Google supports several formats, including hCard, and there are more specific search engines which aim at gathering informations such as a contact or a product list starting from this kind of markup. For a comprehensive list see here.

Now, if you are interested in parsing microformats, there are several tools. Among them, my pick …

more ...