Home People Research Publications Demos
News Jobs Prospective
About Internal

Human Behavior Studies Linking Tracking and Saliency

We performed several experiments investigating the connections between the psychophysics of tracking and saliency.

Pscyophysics Experiment 1 : Saliency affects Tracking Performance

Subjects viewed displays containing a green target disk surrounded by 70 red distractor disks, identical in shape to the target, and a static fixation square.At the start of each trial, the target disk was cued with a bounding box (first frame of the stimulus). Subjects were asked to track the target covertly, without moving their eyes from the fixation point. On a keystroke from the subject, all disks moved independently, with random motion, for 7 seconds. The disks then stopped moving, and the colors of three disks were switched to three new colors - cyan, magenta and blue. Of these, one was the target and the other two the spatially closest distractors. The subjects were asked to identify the target among the three highlighted disks. Participants performed 4 trials each, divided into 2 versions of 2 conditions.

The first version tested how tracking is affected by target saliency. In the first condition, denoted salient, the target remained green throughout the presentation, changing randomly to one of the three highlight colors at the end of the 7 seconds. In the second, denoted non-salient, the target remained green for the first half of this period, switched to red for the remaining time, finally turning to a highlight color. While in the first condition the target is salient throughout the presentation, the second makes the target non-salient throughout the second half of the trial. To eliminate potential effects of any other variables (e.g. target-distractor distances and motion patterns), a non-salient display was created by rotating each frame of a salient display by 90o (and changing the green disk to red in the second half of the presentation).


[stimulus] [model output]


[stimulus] [model output]


[stimulus] [model output]


[stimulus] [model output]


The figures below present the rate of successful tracking in the two experiment versions. In both cases, this rate was much higher in the salient than in the non-salient condition. In the latter, tracking performance was almost at the chance level of 1/3, suggesting complete tracking failure. Overall, tracking performance was vastly improved for salient targets even when they did not pop-out. In fact, the similarity of the detection rates in the two experiments suggests that pop-out plays no role in tracking performance. It suffices for the target to be locally salient.

Globally Salient
Locally Salient

Pscyophysics Experiment 2: Tracking vs. saliency as a function of feature contrast

The results of the first experiment show that tracking is related to saliency. While a salient target is tracked reliably, non-salient targets are close to non-trackable. Experiment 2 aimed to investigate the connection between the two phenomena in greater detail, namely to quantify how tracking reliability depends on target saliency. Since saliency is not an independent variable, it can only be controlled indirectly. This is usually done by manipulating feature contrast between the target and distractors. It is well known that when the target differs from distractors in terms of color, luminance, orientation or texture it can be perceived as salient (Nothdurft, 91). In particular, Nothdurft quantified the dependence of saliency on orientation contrast, in static displays. His work has shown that perceived target saliency increases with the orientation contrast between target and neighboring distractors. This increase is quite non-linear, exhibiting the threshold and saturation effects shown in the figure below, where we present curves of saliency as a function of orientation contrast between target and distractors for three levels of distractor homogeneity.

At the start of a trial, one of the ellipses was designated as target (cued with a white bounding box). Subjects were asked to track the target covertly, while fixating on a white square at the center of the screen. At the end of the trial, all ellipses were completely occluded by larger white disks and the subjects asked to click on the disk corresponding to the target. Each subject performed 30 trials under 7 conditions, for a total of 210 trials. The seven conditions corresponded to different levels of orientation contrast between target and distractor ellipses. Distractor orientation, defined by the major axis of the distractor ellipses, was always 0o. Target orientation, determined by the major axis of the target ellipse, was selected from 7 values: 0o, 10o, 20o, 30o, 40o, 60o or 80o. This made orientation contrast equal to the target orientation. To keep all other variables (e.g. distance between items, motion patterns, distance from target to fixation square) identical, a trial was first created for one condition (target orientation 0o). The trials of all other conditions were obtained by applying a transformation to each frame of this video clip. This consisted of an affine transformation of the grid of ellipse centers, followed by the desired change in target orientation.

To study the effect of distractor heterogeneity (Nothdurft, 1993) three versions of the experiment were conducted with different numbers of ellipses in the "target orientation". In the first version, only one ellipse (the actual target) was in "target orientation". In this case, there was no distractor heterogeneity. In the second version, 18 of the 23 ellipses were in distractor orientation, and the remaining 5 in ``target orientation''. One of the latter was the actual target. Finally, in the third version, 13 ellipses were in distractor and 10 in target orientation, for the largest degree of distractor heterogeneity.

0 distractors identical to target


5 distractors identical to target


10 distractors identical to target



As shown in the figure below, the curves of tracking reliability vs. orientation contrast, obtained in all three versions of the experiment, were remarkably similar to the saliency vs. orientation contrast curves of Nothdurft. As is the case for saliency, 1) distinct threshold and saturation effects were observed for tracking, with tracking reliability saturating when orientation contrast increases beyond 40o, and 2) increased distractor heterogeneity caused a decrease in tracking accuracy.

human tracking success rate vs. orientation contrast

The near perfect correlation (r=0.97) between tracking reliability and saliency is evident from the scatter plot shown below. Each point in this plot corresponds to a different combination of heterogeneity and orientation contrast. In summary, tracking has a dependence on orientation contrast remarkably similar to that of saliency. This is strong evidence for the hypothesis that tracking performance is determined by the saliency of the target, and that tracking and saliency share common neural mechanisms.

scatter plot of saliency values vs tracking accuracy

tracking accuracy vs. orientation contrast for model

Pscyophysics Experiment 3: Effect of background on tracking performance

The results of the Experiments 1 and 2 establish a strong connection between saliency and tracking. In relating saliency and tracking, the saliency hypothesis proposes that tracking uses center-surround mechanisms to identify salient features that make the target distinct from their background. The involvement of a centersurround mechanism in tracking is consistent with the results of Experiment 2, where the tracking performance is seen to depend on distractor heterogeneity - if the surround were not involved in the tracking process, the performance would not depend on the number of distractors similar to the target in the surround.

To test the involvement of a center-surround mechanism in tracking further, we designed another experiment. In this experiment the distance between the target and the closest similar distractor (i.e. one with the same orientation as the target) is controlled so that a region of fixed radius around the target is devoid of any similar distractors. By varying this target-similar distractor distance tsd, and observing the tracking performance, three possible scenarios can be evaluated :

  1. a localized surround region is involved in the tracking process: in this case, when ttsd is varied, there should be a distance, which we shall denote as 't_critical', beyond which all similar distractors are outside the surround region relevant for tracking. So for large enough values of tsd, i.e. tsd > tcritical, distractor heterogeneity should not affect tracking performance.

  2. the entire visual field is involved: if the entire visual field is involved, no such distance, tcritical, should exist and distractor heterogeneity should affect tracking performance for all values of ttsd.

  3. no surround region is included in the tracking process: in this case, the success rate of tracking should be identical in all versions regardless of the distractor heterogeneity.

The results of Experiment 2 already showed that conjecture (3) does not hold. Experiment 3 was designed to determine which among conjectures (1) and (2) holds.

The experimental setting, stimuli and procedure were identical to those in Experiment 2. The target orientation for all stimuli was fixed at 40o. Two versions of the experiment were conducted with different numbers of ellipses in the target orientation corresponding to two values of distractor heterogeneity. As in Experiment 2, in the first version, 18 of the 23 ellipses were in distractor orientation, and the remaining 5 in target orientation, one of the latter being the actual target. In the second version, 13 ellipses were in distractor and 10 in target orientation. In each version, the stimulus sequence could be in one of four conditions depending on the average value of ttsd, i.e. the average, over all frames in the sequence, of the distance between the target and the nearest similar distractor. In each condition, the sequences were designed such that this quantity was in the range 1.67o to 5.01o (about 45 pixels to 135 pixels).

The figure on the left below presents the rate of successful tracking in the two versions as a function of the average distance to nearest similar distractor. Also shown in the figure is the tracking accuracy for the version with no similar distractors at target orientation of 40o from Experiment 2. As there are no distractors similar to the target in this case, a flat line is used to denote the tracking accuracy over all values of the abscissa.

tracking accuracy as a function of the average target-similar distractor distance

model prediction for the same data using the saliency based model

The results show that tracking performance improves as the average distance to nearest similar distractor increases under both versions with non-zero distractor heterogeneity. Further, for large enough value of the distance, tracking accuracy in the two versions are nearly the same as the one with no distractor heterogeneity. This shows that conjecture (a) holds, i.e. a localized surround region of limited size is involved in the tracking task, and tcritical is about 4o. When the identical distractors are kept out of this region, adding more such distractors does not impact tracking performance.

The prediction for the same data using the saliency model is also shown in the figure on the right above. The results clearly show that the model can predict the trend seen in the psychophysics experiment.