| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Buried in cloud files? We can help with Spring cleaning!

    Whether you use Dropbox, Drive, G-Suite, OneDrive, Gmail, Slack, Notion, or all of the above, Dokkio will organize your files for you. Try Dokkio (from the makers of PBworks) for free today.

  • Dokkio (from the makers of PBworks) was #2 on Product Hunt! Check out what people are saying by clicking here.

View
 

2ndLectureNotes

Page history last edited by mike@mbowles.com 10 years, 6 months ago

We'll talk some more about LSI using the Deerwester paper.  Here's a link

DeerwesterJASIS90.pdf

 

We'll also talk about the Porter stemming algorithms

defPorter.txt

 

There's a complete book on information retrieval that's available on-line.  It gives very good coverage to a lot of the preparatory steps that we'll discuss and gives another angle on LSI and using SVD to regularize text searching. 

http://nlp.stanford.edu/IR-book/

 

We'll go through some code in class.  Here are the .r files. 

porter_Rstem.R

porter_snow.R

tmExamp.R

oNLP.R

 

Here's something you can work on to exercise yourself on the tools and techniques that we've talked about so far.

MLText-HW1.txt

 

Here's the recording of the second class:

https://datamining.webex.com/datamining/ldr.php?AT=pb&SP=MC&rID=97962907&rKey=5e7e3307ca626d85

Comments (0)

You don't have permission to comment on this page.