Here are a couple of papers describing latent semantic indexing (LSI)
In class we'll go through:
This paper appears to be the origin of the idea:
Here's a paper from Bellcore talking about how to improve LSI results:
Here's a short survey with somewhat more recent citations:
informationRetrievalSurveyp11-raghavan.pdf
Here's package documentation for the R package "lsa" that does latent semantic analysis (same thing as lsi)