We'll talk some more about LSI using the Deerwester paper. Here's a link
DeerwesterJASIS90.pdf
We'll also talk about the Porter stemming algorithms
defPorter.txt
There's a complete book on information retrieval that's available on-line. It gives very good coverage to a lot of the preparatory steps that we'll discuss and gives another angle on LSI and using SVD to regularize text searching.
http://nlp.stanford.edu/IR-book/
We'll go through some code in class. Here are the .r files.
porter_Rstem.R
porter_snow.R
tmExamp.R
oNLP.R
Here's something you can work on to exercise yourself on the tools and techniques that we've talked about so far.
MLText-HW1.txt
Here's the recording of the second class:
https://datamining.webex.com/datamining/ldr.php?AT=pb&SP=MC&rID=97962907&rKey=5e7e3307ca626d85
Comments (0)
You don't have permission to comment on this page.