Home People Research Publications Demos About

JPEG coding using semantically salient Regions of Interest

This page presents a collection of results obtained with region-of-interest (ROI) coding, using ROI masks produced by object based ROI detection.
The basic idea is to enable image coders to become experts on the visual concepts that are most important to their users. This allows
the allocation of more bits to the image areas where these concepts appear, and can be useful for a number of applications. The process
of training the coder consists of simply collecting (e.g. on Google image search) a number of examples (about 40 in the examples below)
of the concept of interest.

Examples of ROI-based compression of cluttered scenes

Consider the following image, composed of a number of different objects. A semantic coder was trained for each of the five major visual classes in the image:
faces, cars, the Capitol, trees, and street lamps.

Once the coder has been trained to recognize these classes, the user can specify a class of interest. Image areas containing that class are
deemed salient by the coder and allocated more bits. This could be interesting for photo- or video-sharing over wireless networks. The user can quickly
transmit a number of pictures that emphasize the components of the scene of interest (e.g. the user and a monument in the background).
The receiver of the pictures can then quickly select one among those made available and request a high-quality replica of that photo alone.

The following are examples of ROI-coded images, using a JPEG2000 encoder and a saliency detector developed in this project. These examples
present a magnification of the image above, after it has been coded with regular JPEG2000 (left) and ROI-based JPEG2000 (right). The same number
of bits was assigned to the entire image, and training was based on 40 examples collected from the web for each object class. The user specified
the "faces" (row 1 and 2), "cars" (row 3), "Capitol" (row 4), "trees" (row 5), and "street lamps" as the classes of interest. Note the highest fidelity
of the regions containing these visual concepts.

Examples of ROI-based compression of images containing street signs

We next show examples of regular JPEG coding (left) vs ROI coding (right) based on the "street sign" class. In this trial the street-sign
coder was learned from 4
0 examples collected from the web. This coder could be useful for a guidance system: the user simply points a cell
phone at the street sign, an image is compressed and transmitted to a base station, where an OCR system is used to determine the user
location and transmit back a number of suggestions (places to visit, nearby stores, etc.). The allocation of a large percentage of the bits to
the area covered by the street sign enables accurate performance of the OCR system, at an overall reduced bit-rate. In the examples below
this overall rate was made low enough to make the non-ROI coded street signs (or some of their details) imperceptible to a human viewer.
Note how the details of the ROI-coded signs are perceptible at the same overall bit rate.

Examples on the Caltech object database

The next set of examples was produced by a ROI coder trained on human faces. The obvious application would be video-telephony over narrow-band networks.
Note how the coder assigns higher quality to the face area, at the cost of loss of (irrelevant) detail in the background.