Lord Kelvin, Bachelard and Dilbert on Measurement

Lord Kelvin

Here is what William Thompson, better known as Lord Kelvin, once said about measure :

« I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of Science, whatever the matter may be. »
William Thompson, Lecture on “Electrical Units of Measurement” (3 May 1883)

Bachelard

I found this quote in an early essay of the French philosopher Gaston Bachelard on what he calls “approached knowledge” (Essai sur la connaissance approchée, 1927). For him, measures cannot be considered for themselves, and he does not agree with Thompson on this point. According to him, the fact that a measure is precise enough gives us the illusion that something exists or just became real.

I quote in French, as I could find a English edition nearby, the page numbers refer to the book published by Vrin.

« Et pourtant, que ce soit dans la mesure ou dans une comparaison qualitative, il ...

more ...

Crawling a newspaper website to build a corpus

Basing on my previous post about specialized crawlers, I will show how I to crawl a French sports newspaper named L’Equipe using scripts written in Perl, which I did lately. For educational purpose, it works by now but it is bound to stop being efficient as soon as the design of the website changes.

Gathering links

First of all, you have to make a list of links so that you have something to start from. Here is the beginning of the script:

#!/usr/bin/perl #assuming you're using a UNIX-based system...
use strict; #because it gets messy without, and because Perl is faster that way
use Encode; #you have to get the correct encoding settings of the pages
use LWP::Simple; #to get the webpages
use Digest::MD5 qw(md5_hex);

Just an explanation on the last line : we are going to use a hash function to shorten the links and make sure we fetch a single page just once.

my $url = "http://www.lequipe.fr/"; #the starting point

$page = get $url; #the variables ought to be defined somewhere before $page = encode(“iso-8859-1”, $page); #because the pages are not in Unicode format push (@done_md5, substr(md5_hex($url), 0, 8 ...

more ...

Building a basic specialized crawler

As I went on crawling again in the last few days I thought it could be helpful to describe the way I do.

Note that it is for educational purpose only (I am not assuming that I built the fastest and most reliable crawling engine ever) and that the aim is to crawl specific pages of interest. That implies I know which links I want to follow just by regular expressions, because I observe how a given website is organized.

I see two (or eventually three) steps in the process, which I will go through giving a few hints in pseudocode.

A shell script

You might want to write a shell script to fire the two main phases automatically and/or to save your results on a regular basis (if something goes wrong after a reasonable amount of explored pages you don’t want to lose all the work, even if it’s mainly CPU time and electricity).

A list of links

If the website has an archive, a sitemap or a general list of its contents you can spare time by picking the interesting links once and for all.

going through a shortlist of archives DO {      fetch page      find ...

more ...

Workshop on Complexity in Language – Day 2 (report)

I could not follow the whole second day of the Workshop on Complexity in Language (see previous post), but here is what I heard in the morning.

Salikoko Mufwene talked about the emergence of complexity, which he sees as a self-organization process : we don’t plan the way we are going to speak.

He adopts a relativistic perspective speaking of a multi-agent system and asking if the agents are really agentive or if there are triggers of particular behaviors. He likes to consider language as a technology that evolved. At the end of the talk he also tackled the notion of communal complexity and communal patterns used by speakers (also known as norms).

Luc Steels explained his understanding of language complexity and how he simulates communication with robots. He thinks there is an alternative to the evolutionary framework: according to him grammar is functional and not superficial and complexity has grown step by step in a cultural evolution rather than a biological.

His perception of self-organization bases most notably on alignment, structural coupling and linguistic selection. That’s what he builds models for by letting robots find common words to describe a situation (for example the fact that a given ...

more ...

Workshop on Complexity in Language - Day 1 (report)

I attended yesterday the first day of a workshop organized by Salikoko Mufwene and held at the ENS Lyon. This “Workshop on Complexity in Language: Developmental and Evolutionary Perspectives” lasts two days: HTML version of the program.

Here is my personal report on what I heard during the first day and on what I found interesting.

Complexity and complexity science

First of all, William S.-Y. Wang referred to Herbert Simon and Melanie Mitchell in particular to define complexity, two approaches that I described on this blog.

Tom Schoenemann talked about the increasing richness, subtlety and complexity of hominin conceptual understanding which created a need for syntax and grammar as characteristics resulting from it. In the course of history brain areas appear less directly connected, they process information more independently. What he calls “conceptual complexity” bases on the idea of “grounded cognition” developed by Lawrence W. Barsalou.

Barbara L. Davis said of the complexity science that it was another paradigm. Indeed, most of the debate took place on an abstract level, with many different (and not really compatible) notions of language and complexity. William Croft for instance said the whole context of language needed to be taken into account, and ...

more ...

Halliday on complexity (1992)

Sometimes you just feel lucky : I was reading the famous article by Charles J. Fillmore, “Corpus linguistics” or “Computer-aided armchair linguistics”, in the proceedings of a Nobel symposium which took place in 1991 (it is known for the introducing descriptions of the armchair and of the corpus linguist who don’t have anything to say to each other) as I decided to read the following article. The title did not seem promising to me, but still, it was written by Halliday :

M.A.K. Halliday, Language as system and language as instance: The corpus as a theoretical construct, pp. 61-77.

The author gives a few insights on the questions which one could ask to a given text to find a language model. One of the points has to do with “text dynamics”. Here is how Halliday defines it :

« It is a form of dynamic in which there is (or seems to be) an increase in complexity over time: namely, the tendency for complexity to increase in the course of the text. » (p. 69)

In fact, Halliday develops a very interesting idea from the textual dimension of complexity, also named the “unfolding of the text” (p. 69), its “individuation” or the ...

more ...

Approaches to philosophy of technology

I held a presentation last week at the Easterhegg conference in Hamburg, which aim was to give a few insights into this topic and a few notions that could explain aspects of the hacker culture.

My talk was entitled Denkansätze zur Philosophie der Technik, as it dealt with approaches to philosophy of technology.

I started with a historical description of technology as a given fact that no one puts into question, then I spoke from the contempt regarding technicians and the difficulty to consider philosophy of technology as a subfield of philosophy.

The main part of my presentation consisted of a few main themes like the critical perspective on technology and the political dimension of technology assessment. I also suggested a typology of tools and instruments/devices grounding on the work of Gilbert Simondon. Then I briefly described the notion of technoscience.

At last, I presented a broader idea of technology, including for instance government technologies through apparatuses as described by Michel Foucault and more recently Giorgio Agamben, taking the position paper of the German CSU-party as an example.

There is a paper in German regarding this talk that may be found online. Here are the references I used ...

more ...

Simon, Gell-Mann and Lloyd on complex systems

Definition

Herbert A. Simon is one of the first who tried to formalize the notion of a complex system: H. A. Simon, “The Architecture of Complexity”, Proceedings of the American Philosophical Society*, vol. 106, iss. 6, pp. 467-482, 1962.

First of all, here is how he defines it:

« Roughly, by a complex system I mean one made up of a large number of parts that interact in a nonsimple way. In such systems, the whole is more than the sum of the parts, not in an ultimate, metaphysical sense, but in the important pragmatic sense that, given the properties of the parts and the laws of their interaction, it is not a trivial matter to infer the properties of the whole. » p. 467-468

According to Simon the idea of hierarchy (and therefore of architecture) is preponderant.

« By a hierarchic system, or hierarchy, I mean a system that is composed of interrelated subsystems, each of the latter being, in turn, hierarchic in structure until we reach some lowest level of elementary subsystem. » p.468

Nowadays this definition can be considered as a keystone of complex systems theory. To find the architecture, the dependencies between the subsystems, how they interact and interface ...

more ...

Melanie Mitchell: defining and measuring complexity

I just read with peculiar attention the seventh chapter of Complexity: A Guided Tour, by Melanie Mitchell (Defining and measuring complexity, pages 94 to 111). She works with the Santa Fe Institute which is a major institution regarding research on complex systems. She gives a convincing outlook of this field. Still, I did not read anything on the question of language as a complex adaptive system, although there are researchers who focus on this topic (e.g. in Santa Fe).

According to her, there are different sciences of complexity with different notions of what complexity means. The notion of complexity is itself complex. She chooses to refer to three questions coined by Seth Lloyd in 2001 to approach the complexity of a system:

  1. How hard is it to describe ?
  2. How hard is it to create ?
  3. What is its degree of organization ?

Then she details a few definitions which can be seen as sides of the problem. Beginning with a selection from a larger list by Seth Lloyd, she tries to explain where or if these approaches are used. Thus, according to her, these are possible definitions of complexity:

  • Size
  • Entropy
  • Algorithmic information content – Murray Gell-Mann speaks of « effective complexity »
  • Logical ...
more ...

Renate Bartsch on linguistic complexity

I just found a seminal article on complexity written by Renate Bartsch in 1973 (in German). It is a very good summary of the perspective on this topic at the beginning of the ‘70s. The generative grammar background research on language starts to be criticized, but it is still a landmark and a framework (most notably the reflexion on surface and deep structure).

R. Bartsch, “Gibt es einen sinnvollen Begriff von linguistischer Komplexität ?” Zeitschrift für Germanistische Linguistik, vol. 1, iss. 1, pp. 6-31, 1973.

Bartsch focuses on three main aspects of the problem to answer this question: does the idea of linguistic complexity make sense ?

Sociolinguistics

The framework of the transformational grammar alone cannot be trusted when it comes to measuring complexity, because the surface complexity does not account for a potential underlying complexity.
Bartsch quotes the interviews made by Labov and his conclusions stating that the dialect difference is to be found on the surface without having anything to do with the logic of a sentence.

Psycholinguistics

This is by far the most interesting part of the article, lots of criteria for linguistic complexity are analyzed with examples (some in German).
Bartsch also writes about complexity metrics and claims ...

more ...