Bits of Language: corpus linguistics, NLP and text analytics

Building a basic specialized crawler

As I went on crawling again in the last few days I thought it could be helpful to describe the way I do.

Note that it is for educational purpose only (I am not assuming that I built the fastest and most reliable crawling engine ever) and that the aim is to crawl specific pages of interest. That implies I know which links I want to follow just by regular expressions, because I observe how a given website is organized.

I see two (or eventually three) steps in the process, which I will go through giving a few hints in …

more ...

Workshop on Complexity in Language – Day 2 (report)

I could not follow the whole second day of the Workshop on Complexity in Language (see previous post), but here is what I heard in the morning.

Salikoko Mufwene talked about the emergence of complexity, which he sees as a self-organization process : we don’t plan the way we are going to speak.

He adopts a relativistic perspective speaking of a multi-agent system and asking if the agents are really agentive or if there are triggers of particular behaviors. He likes to consider language as a technology that evolved. At the end of the talk he also tackled the notion …

more ...

Workshop on Complexity in Language - Day 1 (report)

I attended yesterday the first day of a workshop organized by Salikoko Mufwene and held at the ENS Lyon. This “Workshop on Complexity in Language: Developmental and Evolutionary Perspectives” lasts two days: HTML version of the program.

Here is my personal report on what I heard during the first day and on what I found interesting.

Complexity and complexity science

First of all, William S.-Y. Wang referred to Herbert Simon and Melanie Mitchell in particular to define complexity, two approaches that I described on this blog.

Tom Schoenemann talked about the increasing richness, subtlety and complexity of hominin conceptual …

more ...

Halliday on complexity (1992)

Sometimes you just feel lucky : I was reading the famous article by Charles J. Fillmore, “Corpus linguistics” or “Computer-aided armchair linguistics”, in the proceedings of a Nobel symposium which took place in 1991 (it is known for the introducing descriptions of the armchair and of the corpus linguist who don’t have anything to say to each other) as I decided to read the following article. The title did not seem promising to me, but still, it was written by Halliday :

M.A.K. Halliday, Language as system and language as instance: The corpus as a theoretical construct, pp. 61-77 …

more ...

Approaches to philosophy of technology

I held a presentation last week at the Easterhegg conference in Hamburg, which aim was to give a few insights into this topic and a few notions that could explain aspects of the hacker culture.

My talk was entitled Denkansätze zur Philosophie der Technik, as it dealt with approaches to philosophy of technology.

I started with a historical description of technology as a given fact that no one puts into question, then I spoke from the contempt regarding technicians and the difficulty to consider philosophy of technology as a subfield of philosophy.

The main part of my presentation consisted of …

more ...

Simon, Gell-Mann and Lloyd on complex systems

Definition

Herbert A. Simon is one of the first who tried to formalize the notion of a complex system: * H. A. Simon, “The Architecture of Complexity”, Proceedings of the American Philosophical Society, vol. 106, iss. 6, pp. 467-482, 1962.

First of all, here is how he defines it:

« Roughly, by a complex system I mean one made up of a large number of parts that interact in a nonsimple way. In such systems, the whole is more than the sum of the parts, not in an ultimate, metaphysical sense, but in the important pragmatic sense that, given the properties of …

more ...

Melanie Mitchell: defining and measuring complexity

I just read with peculiar attention the seventh chapter of Complexity: A Guided Tour, by Melanie Mitchell (Defining and measuring complexity, pages 94 to 111). She works with the Santa Fe Institute which is a major institution regarding research on complex systems. She gives a convincing outlook of this field. Still, I did not read anything on the question of language as a complex adaptive system, although there are researchers who focus on this topic (e.g. in Santa Fe).

According to her, there are different sciences of complexity with different notions of what complexity means. The notion of complexity …

more ...

Renate Bartsch on linguistic complexity

I just found a seminal article on complexity written by Renate Bartsch in 1973 (in German). It is a very good summary of the perspective on this topic at the beginning of the ‘70s. The generative grammar background research on language starts to be criticized, but it is still a landmark and a framework (most notably the reflexion on surface and deep structure).

R. Bartsch, “Gibt es einen sinnvollen Begriff von linguistischer Komplexität ?” Zeitschrift für Germanistische Linguistik, vol. 1, iss. 1, pp. 6-31, 1973.

Bartsch focuses on three main aspects of the problem to answer this question: does the idea …

more ...

Philosophy of technology, how things started: a typology

In my previous post, I presented a few references. I went on reading books and articles on this topic, and I am now able to sort them in several kinds of approaches.

This is mostly thanks to these books in French on philosophy of technology:

G. Simondon, L’invention dans les techniques : cours et conférences, Paris: Seuil, 2005.
G. Hottois, Philosophies des sciences, philosophies des techniques, Paris: Odile Jacob, 2004.
J. Goffi, La philosophie de la technique, Presses Universitaires de France, 1988.
G. Hottois, Le signe et la technique : la philosophie à l’épreuve de la technique, Paris: Aubier, 1984 …

more ...

Philosophy of technology: a few resources

As I once studied philosophy (back in the classes préparatoires), I like to keep in touch with this kind of reflexion. Moreover, in this research field where everything is moving very fast, it is a way to find a few continuities and to ground the peculiar questions regarding the analysis of language in a more conceptual framework.

Here is a list of texts available on the Internet (some of them partly) that seem important to me. Some are written in English, some in French or in German, as I chose the original ones.

It does not have the pretension to …

more ...