| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

2ndLectureNotes

Page history last edited by mike@mbowles.com 12 years, 7 months ago

We'll talk some more about LSI using the Deerwester paper.  Here's a link

DeerwesterJASIS90.pdf

 

We'll also talk about the Porter stemming algorithms

defPorter.txt

 

There's a complete book on information retrieval that's available on-line.  It gives very good coverage to a lot of the preparatory steps that we'll discuss and gives another angle on LSI and using SVD to regularize text searching. 

http://nlp.stanford.edu/IR-book/

 

We'll go through some code in class.  Here are the .r files. 

porter_Rstem.R

porter_snow.R

tmExamp.R

oNLP.R

 

Here's something you can work on to exercise yourself on the tools and techniques that we've talked about so far.

MLText-HW1.txt

 

Here's the recording of the second class:

https://datamining.webex.com/datamining/ldr.php?AT=pb&SP=MC&rID=97962907&rKey=5e7e3307ca626d85

Comments (0)

You don't have permission to comment on this page.