Machine Learning for Naturl Languge Processing Data - Winter 2010
Project Guidelines
The projects are the most significant part of your grade in this class
(50%), and the best way to explore and apply the material covered.
The key to a successful project is to start early and keep a steady
pace throughout the semester. Projects can be done individually or
in teams of two.
- Important due dates:
- Proposal: February 4, 2011
- Milestone: March 4, 2011
- Report: April 15, 2010
Topics
The hardest part is picking a project topic. Ideally,
you will pick something exciting to you, possibly related to problems
you already know and care about. Typical projects fall into the
following categories:
- Application: Apply techniques you learned in class (or
other known NLP techniques) to a novel task, or area that
interests you. This often involves a large component of data
collection and analysis.
- Bake-off: Take several known algorithms/techniques and
carefully compare them experimentally and/or theoretically on several
standard (or novel) problems and datasets.
- Algorithm: Develop a new method or algorithm for NLP problems. Often this results from combining ideas
from several known algorithms in an interesting manner, or "lifting"
techniques from other areas of computer science, statistics, physics,
etc.
- Analysis: Theoretically analyze and prove interesting
properties of a known algorithm.
Take a look at some recent NLP research papers for
inspiration.
ACL Anthology
, NIPS and ICML are three of the main
resources where natural language processing is published, with
many of the recent papers online. Of course, conferences on
natural language processing and machine learning, and
many others have a lot of related papers as well.
Evaluation
- Project grade breakdown:
- Proposal: 10%
- Milestone: 30%
- Report: 60%
Project proposal
Project proposals should be emailed to me by
noon on February 4, 2011. Please send it in simple ASCII format, no
attachments. Include the title of the project, and the name/email of
other person in your group if you are a group of two and about 250-500
word description of the project, including the problems/tasks,
algorithms/techniques, datasets/resources, relevant papers/references
you plan to use/address. Also include what challenges you expect to
arise and what you plan to accomplish by the milestone submission.
Please send one email per group.
Milestone report
Project milestone reports should be emailed
as PDF attachments to me by noon on March 4, 2011. The length
of report should be at most 3 pages. Please include the names of the
team members and the title the project. The milestone report is
designed to help you keep on-track and for me to provide feedback and
advice. Describe your progress so far and your plan to finish. Keep it
brief and to the point: enough for me to understand what you are doing,
assuming the material covered in class as background
knowledge. Hopefully, you can reuse most of pieces the milestone
report as a part of the final report.
Final report
Final project reports should be emailed as PDF
attachments to me by noon on April 15, 2010. The reports should
be at most 6 pages.
The format of your report should resemble a conference paper, with a general outline of the form:
- Abstract
- Introduction/Motivation
- Methods/Algorithms
- Results/Experiments
- Discussion/Conclusion
- References
The projects will be evaluated according to the following four
criteria:
- Soundness: Are the claims technically correct and techniques and approaches reasonable for the problem?
- Significance: Is the problem addressed important and/or interesting?
- Novelty: Is there something new and interesting about the project (novel application, algorithm, analysis, evaluation)?
- Clarity: Is the presentation clear and concise, but complete enough for someone familiar with graphical models and machine learning?
The reports should be written either in Hebrew or English.
I plan to post all the final reports online so that you can read about other projects. Let me know if do not want your report to be posted.