Natural Images

Natural Images

Figure 1View OriginalDownload Slide Formulating local cues to figure–ground assignment. Three cues are defined locally inside an analysis window centered at a contour point p. The Size cue describes the relative size of the neighboring regions. LowerRegion compares the relative locations of the center of masses of the two regions. The Convexity cue captures the relative convexity of the two neighboring regions. Convexity is defined as the probability that a line segment connecting two points in a region lies completely within the region. The six panels at the right demonstrate the information captured by each cue at two different scales. The base of each colored line segment along the boundary marks the point on the contour at which the cue was computed and points towards the predicted ground region. The length of the line indicates the relative magnitude of the cue. The cues of size, lower region, and convexity are indicated with red, blue, and green, respectively.Figure 1 Formulating local cues to figure–ground assignment. Three cues are defined locally inside an analysis window centered at a contour point p. The Size cue describes the relative size of the neighboring regions. LowerRegion compares the relative locations of the center of masses of the two regions. The Convexity cue captures the relative convexity of the two neighboring regions. Convexity is defined as the probability that a line segment connecting two points in a region lies completely within the region. The six panels at the right demonstrate the information captured by each cue at two different scales. The base of each colored line segment along the boundary marks the point on the contour at which the cue was computed and points towards the predicted ground region. The length of the line indicates the relative magnitude of the cue. The cues of size, lower region, and convexity are indicated with red, blue, and green, respectively.View OriginalDownload Slide Figure 2View OriginalDownload Slide Acquiring figure–ground labels. Human subjects labeled each contour in an image, indicating to which region it “belongs.” Starting from a segmentation of the original image (left), subjects were presented with a sequence of highlighted contours corresponding to each pair of neighboring regions (center). The subject indicated which of the two regions was the figural element. The reported figural region is displayed here with a red tint, ground with a blue tint. Subjects also had the option of attributing a boundary to a change in surface albedo or a discontinuity in the surface normal. Such a boundary, exemplified by the corner between the building and earth, marked in green, was seen as belonging to both segments. Once all the contours had been labeled, the subject was presented with the final labeling (right) and given the opportunity to fix any mistakes.Figure 2 Acquiring figure–ground labels. Human subjects labeled each contour in an image, indicating to which region it “belongs.” Starting from a segmentation of the original image (left), subjects were presented with a sequence of highlighted contours corresponding to each pair of neighboring regions (center). The subject indicated which of the two regions was the figural element. The reported figural region is displayed here with a red tint, ground with a blue tint. Subjects also had the option of attributing a boundary to a change in surface albedo or a discontinuity in the surface normal. Such a boundary, exemplified by the corner between the building and earth, marked in green, was seen as belonging to both segments. Once all the contours had been labeled, the subject was presented with the final labeling (right) and given the opportunity to fix any mistakes.View OriginalDownload Slide Figure 3a, 3b, 3c View Original Download Slide View Original Download Slide View Original Download Slide   The statistics of local figure–ground cues in natural scenes. Each histogram shows the empirical distributions of Size(p), LowerRegion(p), and Convexity(p) for 50,000 points sampled from human-labeled contours in 200 natural images computed over a window with radius r = 5% contour length. Figure 3a, 3b, 3c   The statistics of local figure–ground cues in natural scenes. Each histogram shows the empirical distributions of Size(p), LowerRegion(p), and Convexity(p) for 50,000 points sampled from human-labeled contours in 200 natural images computed over a window with radius r = 5% contour length. View Original Download Slide View Original Download Slide View Original Download Slide Figure 4View OriginalDownload Slide Quantifying the relative power of local figure–ground cues in natural scenes. The power of individual cues and cue combinations is quantified by measuring the correct classification rate, plotted here as a function of window radius. Multiple cues are combined using logistic regression fit to training data. The error bars show 1 SD measured over held-out data during 10-fold cross-validation. The legend gives the highest classification rate achieved for each combination of cues. The analysis window radius is measured relative to the length of the contour being analyzed to make it (approximately) invariant to an object’s distance from the camera.Figure 4 Quantifying the relative power of local figure–ground cues in natural scenes. The power of individual cues and cue combinations is quantified by measuring the correct classification rate, plotted here as a function of window radius. Multiple cues are combined using logistic regression fit to training data. The error bars show 1 SD measured over held-out data during 10-fold cross-validation. The legend gives the highest classification rate achieved for each combination of cues. The analysis window radius is measured relative to the length of the contour being analyzed to make it (approximately) invariant to an object’s distance from the camera.View OriginalDownload Slide Figure 5View OriginalDownload Slide Subjects made figure–ground judgments for local stimuli, like those shown, consisting of a cropped disc depicting either region shape (configuration) or image luminance (configuration + content). In the luminance condition, the two regions on either side of the contour were distinguished by red and blue tints. The color assignments were randomized over trials, but in this figure, the white/red tinted segments indicate which region was figural according to the ground-truth labels. Numbers indicate the window radius for each patch as a percentage of the contour length.Figure 5 Subjects made figure–ground judgments for local stimuli, like those shown, consisting of a cropped disc depicting either region shape (configuration) or image luminance (configuration + content). In the luminance condition, the two regions on either side of the contour were distinguished by red and blue tints. The color assignments were randomized over trials, but in this figure, the white/red tinted segments indicate which region was figural according to the ground-truth labels. Numbers indicate the window radius for each patch as a percentage of the contour length.View OriginalDownload Slide Figure 6View OriginalDownload Slide Quantifying the importance of context and content. The correct classification rate and standard deviation across subjects ( n = 4 subjects in each condition) are plotted as a function of context. We also plot the classification performance of our computational model (S, L, and C) on the same set of local windows, with whiskers marking 1 SD of the sample proportion. The grid line at 0.96 indicates the level of global labeling consistency in the ground truth figure–ground assignments.Figure 6 Quantifying the importance of context and content. The correct classification rate and standard deviation across subjects ( n = 4 subjects in each condition) are plotted as a function of context. We also plot the classification performance of our computational model (S, L, and C) on the same set of local windows, with whiskers marking 1 SD of the sample proportion. The grid line at 0.96 indicates the level of global labeling consistency in the ground truth figure–ground assignments.View OriginalDownload Slide
natural images 1

Natural Images

Figure 3a, 3b, 3c View Original Download Slide View Original Download Slide View Original Download Slide   The statistics of local figure–ground cues in natural scenes. Each histogram shows the empirical distributions of Size(p), LowerRegion(p), and Convexity(p) for 50,000 points sampled from human-labeled contours in 200 natural images computed over a window with radius r = 5% contour length. Figure 3a, 3b, 3c   The statistics of local figure–ground cues in natural scenes. Each histogram shows the empirical distributions of Size(p), LowerRegion(p), and Convexity(p) for 50,000 points sampled from human-labeled contours in 200 natural images computed over a window with radius r = 5% contour length. View Original Download Slide View Original Download Slide View Original Download Slide
natural images 2

Natural Images

Abstract Figure–ground organization refers to the visual perception that a contour separating two regions belongs to one of the regions. Recent studies have found neural correlates of figure–ground assignment in V2 as early as 10–25 ms after response onset, providing strong support for the role of local bottom–up processing. How much information about figure–ground assignment is available from locally computed cues? Using a large collection of natural images, in which neighboring regions were assigned a figure–ground relation by human observers, we quantified the extent to which figural regions locally tend to be smaller, more convex, and lie below ground regions. Our results suggest that these Gestalt cues are ecologically valid, and we quantify their relative power. We have also developed a simple bottom–up computational model of figure–ground assignment that takes image contours as input. Using parameters fit to natural image statistics, the model is capable of matching human-level performance when scene context limited.