Course 048716 - 2010/2011
Advanced Topics in Learning, Systems and Control 2
Machine Learning for Natural Language Processing
Instructor: Koby Crammer
Lectures: :Sunday 12:30-14:30 , Mayer Bldg, Rm. 351
Contact: 3274 or by email (find it here )
Office hours: Tuesday 9:00-10:00 and by appointment
Announcements
Course Plan
Processing written natural language pose many problems: from building
words, via parsing sentences, and to understanding what piece of text
is about. The course covers topics in machine learning approach for
natural language processing, combining statistics and computational
linguistics. The course will introduce major tools to represent
language, review algorithms for syntax processing and classification,
and cover methods for unsupervised learning tools for semantics. We
will demonstrate the usefulness of these methods in real world data.
- Language Modeling
- Words Statistics
- Automata and CFGs
- Syntax
- Sequential models - Shallow Parsing
- Dependency and Complete Parsing
- Classification
- Naive Bayes, Log-linear models, Maximum entropy models
- Regularizations
- Semantics
- Clustering
- Latent Semantic Indexing, Latent Dirichlet Allocation
- More Topics
- Semi Supervised Learning
- Student Presentation
Bibliography
There is no published textbook for the course. Topics will be
presented from the following books:
- Speech and Language Processing (2nd Edition) by Daniel Jurafsky and James H. Martin, 2008.
library catalog
- Introduction to Information Retrieval by Manning Christopher D., Raghavan Prabhakar and Schütze Hinrich, 2008. library catalog
More related books are:
- Foundations of statistical natural language processing by Christopher D. Manning, Hinrich Schutze, 1999
library catalog
- Introduction to Information Retrieval by Manning Christopher D., Raghavan Prabhakar and Schütze Hinrich, 2008. library catalog
- Statistical methods for speech recognition by Frederick Jelinek, 1997. library catalog
- Spoken language processing :a guide to theory, algorithm, and system development by Xuedong Huang, Alex Acero, Hsiao-Wuen Hon, 2001. library catalog
Lectures
Lecture 1 (17/10): Introduction
Lecture 2 (24/10): Language Modeling: N-Grams
Lecture 3 (31/10): Regular Expressions and Finite State Automaton
Lecture 4 (07/11): Context Free Grammars
Lecture 5 (14/11): Parsing, Dependency Parsing
Lecture 6 (21/11): Naive Bayes, Maximum Entropy, Log-Linear Models
Lecture 7 (28/11): Random Projections, L1 regularization
                  (5/12): L1 Hanuka break
Lecture 8 (12/12): Feature selection
Lecture 9 (19/12): Complex problems: multi-class, multi-class multi-labeled, structured learning
Lecture 10 (26/12): HMMs, MEMMs, CRFs
Lecture 11 (02/01): Clustering, IBN method
Lecture 12 (09/01): Co-ClusteringLSA
Lecture 13 (16/01): (pLAS, LDA)
Lecture 14 (23/01): Class Presentations
Exercises
Exercise 1 (Due: January 6):
New exercise. Data and sample code is .
here
Grading
- Three assignments: 30% (10% each)
- Presentation: 20%
- Project: 50%