Machine Learning for Natural Language Processing

Course 048716 - 2010/2011
Advanced Topics in Learning, Systems and Control 2
Machine Learning for Natural Language Processing

Instructor: Koby Crammer
Lectures: :Sunday 12:30-14:30 , Mayer Bldg, Rm. 351

Contact: 3274 or by email (find it here )
Office hours: Tuesday 9:00-10:00 and by appointment

Announcements

January 13 - Check project guidelines
January 6 - Check list of presentors and papers
December 29 - Check presentations guidelines and information
December 23 - New exercise due January 6, data and sample code is here.
November 28 - Class is given by Prof. Mannor
October 24 - I taught another course with the same course id last year, its a different course this year!

Course Plan

Processing written natural language pose many problems: from building words, via parsing sentences, and to understanding what piece of text is about. The course covers topics in machine learning approach for natural language processing, combining statistics and computational linguistics. The course will introduce major tools to represent language, review algorithms for syntax processing and classification, and cover methods for unsupervised learning tools for semantics. We will demonstrate the usefulness of these methods in real world data.

Language Modeling
- Words Statistics
- Automata and CFGs
Syntax
- Sequential models - Shallow Parsing
- Dependency and Complete Parsing
Classification
- Naive Bayes, Log-linear models, Maximum entropy models
- Regularizations
Semantics
- Clustering
- Latent Semantic Indexing, Latent Dirichlet Allocation
More Topics
- Semi Supervised Learning
- Student Presentation

Bibliography

There is no published textbook for the course. Topics will be presented from the following books:

Speech and Language Processing (2nd Edition) by Daniel Jurafsky and James H. Martin, 2008. library catalog
Introduction to Information Retrieval by Manning Christopher D., Raghavan Prabhakar and Schütze Hinrich, 2008. library catalog

More related books are:

Foundations of statistical natural language processing by Christopher D. Manning, Hinrich Schutze, 1999 library catalog
Introduction to Information Retrieval by Manning Christopher D., Raghavan Prabhakar and Schütze Hinrich, 2008. library catalog
Statistical methods for speech recognition by Frederick Jelinek, 1997. library catalog
Spoken language processing :a guide to theory, algorithm, and system development by Xuedong Huang, Alex Acero, Hsiao-Wuen Hon, 2001. library catalog

Lectures

Lecture 1 (17/10): Introduction
Lecture 2 (24/10): Language Modeling: N-Grams
Lecture 3 (31/10): Regular Expressions and Finite State Automaton
Lecture 4 (07/11): Context Free Grammars
Lecture 5 (14/11): Parsing, Dependency Parsing
Lecture 6 (21/11): Naive Bayes, Maximum Entropy, Log-Linear Models
Lecture 7 (28/11): Random Projections, L1 regularization
(5/12): L1 Hanuka break
Lecture 8 (12/12): Feature selection
Lecture 9 (19/12): Complex problems: multi-class, multi-class multi-labeled, structured learning
Lecture 10 (26/12): HMMs, MEMMs, CRFs
Lecture 11 (02/01): Clustering, IBN method
Lecture 12 (09/01): Co-ClusteringLSA
Lecture 13 (16/01): (pLAS, LDA)
Lecture 14 (23/01): Class Presentations

Exercises

Exercise 1 (Due: January 6): New exercise. Data and sample code is . here

Grading

Three assignments: 30% (10% each)
Presentation: 20%
Project: 50%

Course 048716 - 2010/2011 Advanced Topics in Learning, Systems and Control 2 Machine Learning for Natural Language Processing