| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

FrontPage

This version was saved 12 years, 6 months ago View current version     Page history
Saved by mike@mbowles.com
on October 7, 2011 at 11:33:20 am
 

If you want to join the class email - please fill out this form

 

Overview of the Course

This class will cover machine learning applied to natural language text documents.  We will cover the use of statistical algorithms - not more traditional semantics, parsing, etc.  We'll start with some introduction to the subject matter, comparison of statistical techniques to semantic approaches, definition of problems in text mining, and simple text manipulations.  We'll cover various algorithms for dealing with standard text mining problems, such as indexing, automatic classification (e.g. spam filtering) part of speech identification, topic modeling, sentiment extraction, etc. 

 

 

We'll use open literature for the reading in the class and hand out those references as we go along. 

 

 

Prerequisites

We'll employ beginner-level probability, calculus and linear algebra (e.g. preruse the appendices in "Introduction to Data Mining" by Tan et. al. or Linear Algebra, and Probability Theory.)  We'll also assume familiarity with basic machine learning algorithms (regression, logistic regression, regularized regression, svm, ensemble methods, clustering, etc.)  You can find coverage of these methods in Tan's book.  If you have taken Machine Learning 101 and 102 classes, you are well prepared for this course.  

 

Participants should be familiar with R or be willing to pick R up outside of class.  We will hand out R-code for most of our examples, but we won't spend time going through introductory material on R.  Come to the first class with R loaded on your computer.  http://cran.r-project.org/  For your review, R are here: References for R,  Reference for R Comments,  More R references.  To integrate R with Eclipse click here

 

To get the most out of the class, participants will need to work through the homework assignments. 

 

General Sequence of Classes:

Machine Learning 101:   Supervised learning

Text: "Introduction to Data Mining", by Pang-Ning Tan, Michael Steinbach and Vipin Kumar

Machine Learning 102Unsupervised Learning and Fault Detection

Text: "Introduction to Data Mining", by Pang-Ning Tan, Michael Steinbach and Vipin Kumar

 

Machine Learning 201:    Advanced Regression Techniques, Generalized Linear Models, and Generalized Additive Models    

Text:  "The Elements of Statistical Learning - Data Mining, Inference, and Prediction"  by Trevor Hastie, Robert Tibshirani, and Jerome Friedman

 

Machine Learning 202:   Collaborative Filtering, Bayesian Belief Networks, and Advanced Trees

Text:  "The Elements of Statistical Learning - Data Mining, Inference, and Prediction"  by Trevor Hastie, Robert Tibshirani, and Jerome Friedman

 

Machine Learning Big Data:  Adaptation and execution of machine learning algorithms in the map reduce framework.

 

Mike Bowles

 

Week
Topic
References
1st Week
Introduction and Basic Text Manipulations
 
2nd Week
Part of Speech Detection
 
3rd Week
Topic Modeling
 
4th Week
Machine Translation
 
5th Week
Text Classification
 

 

 

Comments (0)

You don't have permission to comment on this page.