A short bibliography on Latent Semantic Analysis and Indexing

To go a bit further than my previous post, here are a few references that I recently found to be interesting.

For a definition and/or other short bibliographies, see Wikipedia or something else this time : Scholarpedia, with an article “curated” by T.K. Landauer and S.T. Dumais.

U. Mortensen, Einführung in die Korrespondenzanalyse, Universität Münster,2009.

G. Gorrell and B. Webb, “Generalized Hebbian Algorithm for Incremental Latent Semantic Analysis,” in Ninth European Conference on Speech Communication and Technology, 2005.

P. Cibois, Les méthodes d’analyse d’enquêtes, Que sais-je ?, 2004.

B. Pincombe, Comparison of Human and Latent Semantic …

Building a topic-specific corpus out of two different corpora

I have (say, I crawled two websites and got hold of) two corpora which sometimes focus on the same topics. I would like to try and melt them together in order to build a balanced and coherent corpus. As this is a highly discussed research topic there are plenty of subtle ways to do it.

Still, as I am only at the beginning of my research and as I don’t know how far I am going to go with both corpora I want to keep it simple.

One of the appropriate techniques (if not the best)

I could do …

