How to Evaluate Foreground Maps?
Ran Margolin, Lihi Zelnik-Manor and Ayellet Tal
Technion Haifa, Israel
Abstract
The output of many algorithms in computer-vision is either non-binary maps or binary maps (e.g., salient object detection and object segmentation). Several measures have been suggested to evaluate the accuracy of these foreground maps. In this paper, we show that the most commonly-used measures for evaluating both non-binary maps and binary maps do not always provide a reliable evaluation. This includes the Area-Under-the-Curve measure, the Average-Precision measure, the F-measure, and the evaluation measure of the PASCAL VOC segmentation challenge. We start by identifying three causes of inaccurate evaluation. We then propose a new measure that amends these flaws. An appealing property of our measure is being an intuitive generalization of the F-measure. Finally we propose four meta-measures to compare the adequacy of evaluation measures. We show via experiments that our novel measure is preferable.
Paper
Ran Margolin, Lihi Zelnik-Manor and Ayellet Tal, “How to Evaluate Foreground Maps?”, CVPR 2014 [pdf]
"Funny behaviour" of common measures
Previous measures such as Area-under-the-curve (AUC), Average-precision (AP), F-measure and PASCAL VOC's segmentation measure, exhibit "funny behaviours":
Our Measure
No "Funny behaviour"
Agrees with application: Context-based image retrieval
Code
Our proposed measure - Weighted F-measure
Foreground evaluation code (MATLAB) [download] - Tested on Windows 64-bit Matlab 2012b & 2013a.
The code is for academic purposes only. Please cite this paper if you make use of it:
@conference{margoinEval14, title={How to Evaluate Foreground maps?}, author={Margolin, R. and Zelnik-Manor, L. and Tal, A}, year = {2014}, booktitle = {CVPR}}
In case of any problems please contact us at: margolin (at) tx.technion.ac.il or hovav (at) ee.technion.ac.il