A short bibliography on Latent Semantic Analysis and Indexing

To go a bit further than my previous post, here are a few references that I recently found to be interesting.

For a definition and/or other short bibliographies, see Wikipedia or something else this time : Scholarpedia, with an article “curated” by T.K. Landauer and S.T. Dumais.

U. Mortensen, Einführung in die Korrespondenzanalyse, Universität Münster,2009.

G. Gorrell and B. Webb, “Generalized Hebbian Algorithm for Incremental Latent Semantic Analysis,” in Ninth European Conference on Speech Communication and Technology, 2005.

P. Cibois, Les méthodes d’analyse d’enquêtes, Que sais-je ?, 2004.

B. Pincombe, Comparison of Human and Latent Semantic Analysis (LSA) Judgements of Pairwise Document Similarities for a News Corpus, Australian Department of Defence,2004.

M. W. Berry, S. T. Dumais, and G. W. O’Brien, “Using Linear Algebra for Intelligent Information Retrieval,” SIAM Review, vol. 37, iss. 4, p. pp. 573-595, 1995.

S. Dumais, Enhancing performance in latent semantic indexing (LSI) retrieval, Bellcore,1992.

S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by latent semantic analysis”, Journal of the American society for information science, vol. 41, iss. 6, pp. 391-407, 1990.

G. Salton, A. Wong, and C. S. Yang, “A vector ...

more ...

Building a topic-specific corpus out of two different corpora

I have (say, I crawled two websites and got hold of) two corpora which sometimes focus on the same topics. I would like to try and melt them together in order to build a balanced and coherent corpus. As this is a highly discussed research topic there are plenty of subtle ways to do it.

Still, as I am only at the beginning of my research and as I don’t know how far I am going to go with both corpora I want to keep it simple.


One of the appropriate techniques (if not the best)

I could do it using LSA (in this particular case Latent semantic analysis, and not Lysergic acid amide !) or to be more precise Latent semantic indexing.

As this technical report shows, it can perform well in that kind of case
Comparison of Human and Latent Semantic Analysis (LSA) Judgements of Pairwise Document Similarities for a News Corpus,  B. Pincombe, Australian Department of Defence, 2004. (full text available here or through any good search engine, see previous post)

This could be an issue for later research.


The approach that I am working on (not quick and dirty but simpler and hopefully robust ...

more ...