The problem of identifying a group of pre-defined features in a given image is a main subcategory of computer vision. It has various uses in computer vision such as facial expression recognition, automatic image/video editing, personal face recognition, etc. In this project, we are recognizing a group of 15 keypoints on images of human faces.
In our project, we are using a database of training and test images from a web contest, which is taking place at Kaggle website. The images are 96x96 pixels. The training set consists of about 7000 images, manually pre-labeled with the landmarks to use as ground truth.
The general learning process in this project is an implementation of a basic idea in machine learning methods: we use the labeled training set and the provided corresponding labels for machine learning, and then use the resulting classifiers on the test set. This process produces a list of candidates for each keypoint, on which we use various methods to pick the actual keypoints.
We achieved an RMSE (root mean squared error) of 3.96 pixels. The best RMSE score in the Kaggle contest is 1.96 pixels.