Home | People | Research | Publications | Demos |
News | Jobs |
Prospective
Students |
About | Internal |
Comparison of Semantic Image Annotation Algorithms | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The adoption of a standard annotation database (Corel5k) has helped further research in semantic annotation by providing a benchmark for comparing algorithms.
This page contains comparisons of different annotation algorithms on the standard databases.
If you have new results that you would like to add to these tables, contact Nuno Vasconcelos with the reference.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Evaluation Metrics
For evaluation semantic annotation, automatic annotation is defined as the top five semantic classes assigned by the algorithm, and the recall and
precision of every word in the test set is computed.
For a given semantic
descriptor, assuming that there are wH human annotated
images in the test set and the system annotates wauto, of which
wC are correct, the per-word recall and precision are given by recall = wC / wH
and precision = wC / wauto, respectively.
Finally, the values of recall and precision are averaged
over the set of words that appear in the test set to get the average per-word precision (P) and average per-word recall (R).
We also consider the number of words with non-zero recall (NZR), i.e., words
with wC > 0, which provides an indication of how many
words the system has effectively learned.
The performance of semantic retrieval is also evaluated
by measuring precision and recall. Given a query term and
the top n image matches retrieved from the database, recall
is the percentage of all relevant images contained in the
retrieved set, and precision is the percentage of n which are
relevant (where relevant means that the ground-truth
annotation of the image contains the query term).
Retrieval performance is evaluated with
the mean-average precision (MAP), defined in [3], which
is the average precision, over all queries, at the ranks
where recall changes (i.e., where relevant items occur).
On the larger database of [7], where per-image annotations are not available and automatic annotation is based on image categorization, two other metrics are used for evaluation.
The first is "image categorization accuracy", where an image is considered correctly categorized if any of the top r categories is the true category.
Second, annotation performance is evaluated with the "mean coverage", which is defined as the percentage of ground-truth annotations that match the
computer annotations.
For more details, please see the references. The evaluation metrics are summarized in the following table:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Corel5k
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Corel30k
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
PSU
Performance on the PSU database can also be evaluated using supervised category-based learning (SCBL), which is based on classifiers for image categories. For an image, the annotations are selected from the words associated with the top 5 image categories, according to a statistical test (see [7,1] for more details). The algorithms are evaluated on the accuracy of image categorization, and the "mean coverage" of the annotations.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
References
|
©
SVCL