Learn Facial Expressions From an Image

Abstract

One motivation for representation learning is that learning algorithms can design features better and faster than humans can. To this end, we hold this challenge that does not explicitly require that entries use representation learning. Rather, introduce an entirely new dataset. The dataset for this challenge is a facial expression classification dataset that have been assembled from the internet.

This task is very easy for humans to do, for computers, not so much.

The project target is to implement and design an intelligent algorithm for the task of distinguishing between 7 expressions (Angry, Disgust, Fear, Happy, Sad, Surprise, Neutral). The training Data set is ~35000 face images and the test set is ~5000 image.

Method

The project make use of the FER-2013 dataset. The dataset consist of images and their facial expression label. Each image is cropped, centered and have a size of 48x48 grayscale pixels. The dataset's labels are estimated with 65% of accuracy.

The algorithm make use of the standart Bag of Words method with few modifications:

1. Use of Raw Patches with normalizing and whitening instead of Dense-SIFT patches.

2. Use word presence vector instead of word frequency vector.

3. Use of spatial feature pyramid.

4. Use histogram intersection kernel.

5. Use local SVM training only with similar images.

Results

Expecting near results to the dataset's accuracy, a table is spread:

	SIFT Patches Accuracy (%)	Raw Patches Accuracy (%)	Computing time (Hours
No kernel	43.1	45.7	9
Histogram Intersection Kernel	54.2	56.4	5

Surprisingly, Raw Patches achieve better results than Dense-SIFT Patches witch are invariant to scale, lightning and rotation.

Another interesting result is the computing time of using the Histogram Intersection Kernel. The kernel inflates each feature, making one to believe that using it will take more time but it appears that it takes much less time and higher accuracy. This could be explained with an hypothesis that the kernel makes the feature space separable, resulting the SVM algorithm to converge much faster with less mistakes.