| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Whenever you search in PBworks or on the Web, Dokkio Sidebar (from the makers of PBworks) will run the same search in your Drive, Dropbox, OneDrive, Gmail, Slack, and browsed web pages. Now you can find what you're looking for wherever it lives. Try Dokkio Sidebar for free.

View
 

2ndLectureNotes

Page history last edited by mike@mbowles.com 11 years, 4 months ago

We'll talk some more about LSI using the Deerwester paper.  Here's a link

DeerwesterJASIS90.pdf

 

We'll also talk about the Porter stemming algorithms

defPorter.txt

 

There's a complete book on information retrieval that's available on-line.  It gives very good coverage to a lot of the preparatory steps that we'll discuss and gives another angle on LSI and using SVD to regularize text searching. 

http://nlp.stanford.edu/IR-book/

 

We'll go through some code in class.  Here are the .r files. 

porter_Rstem.R

porter_snow.R

tmExamp.R

oNLP.R

 

Here's something you can work on to exercise yourself on the tools and techniques that we've talked about so far.

MLText-HW1.txt

 

Here's the recording of the second class:

https://datamining.webex.com/datamining/ldr.php?AT=pb&SP=MC&rID=97962907&rKey=5e7e3307ca626d85

Comments (0)

You don't have permission to comment on this page.