Published Online:https://doi.org/10.1152/jn.01074.2006

Abstract

Binocular disparity, the slight differences between the images registered by our two eyes, provides an important cue when estimating the three-dimensional (3D) structure of the complex environment we inhabit. Sensitivity to binocular disparity is evident at multiple levels of the visual hierarchy in the primate brain, from early visual cortex to parietal and temporal areas. However, the relationship between activity in these areas and key perceptual functions that exploit disparity information for 3D shape perception remains an important open question. Here we investigate the link between human cortical activity and the perception of disparity-defined shape, measuring fMRI responses concurrently with psychophysical shape judgments. We parametrically degraded the coherence of shapes by shuffling the spatial position of dots whose disparity defined the 3D structure and investigated the effect of this stimulus manipulation on both cortical activity and shape discrimination. We report significant relationships between shape coherence and fMRI response in both dorsal (V3, hMT+/V5) and ventral (LOC) visual areas that correspond to the observers' discrimination performance. In contrast to previous suggestions of a dichotomy of disparity-related processes in the ventral and dorsal streams, these findings are consistent with proposed interactions between these pathways that may mediate a continuum of processes important in perceiving 3D shape from coarse contour segmentation to fine curvature estimation.

INTRODUCTION

Everyday human behavior depends on the brain estimating the depth structure of the nearby environment so that we can avoid dangers and exploit opportunities. A powerful source of information to depth structure is provided by the slight differences between the retinal images registered by our two eyes (binocular disparity). The brain's use of binocular disparity has been studied extensively (Cumming and DeAngelis 2001; Howard and Rogers 2002), largely because of the quantitative relation between disparity and the depth structure of the environment (Longuet-Higgins 1982), the exquisite sensitivity of the brain to disparity (Westheimer and McKee 1978), and the observation that horizontal binocular disparity provides a powerful impression of depth even under impoverished viewing conditions (Julesz 1971).

Neurophysiological and imaging studies revealed selectivity for binocular disparity at multiple levels of the visual hierarchy in the monkey and the human brain from early visual areas, to object- and motion-selective areas and the parietal cortex (for reviews see Cumming and DeAngelis 2001; Neri 2005; Orban et al. 2006a,b; Parker 2004). However, understanding how activity in these multiple disparity-selectivity regions relates to key visual functions that guide human behavior remains an important challenge. Specifically, how do different cortical networks support key computational functions that exploit depth estimates to perceive the shape of three-dimensional (3D) objects, interpret their material properties, recognize 3D objects in complex scenes, and control the movement of body parts when interacting with objects? Previous work (for reviews see Neri 2005; Orban et al. 2006b; Parker 2004; Tyler 1990) has proposed a functional dichotomy between the dorsal visual pathway ascending into parietal cortex that processes disparity signals relating to spatial position and object-related actions and the ventral visual pathway leading to temporal cortex that supports shape discrimination and object recognition. Our study aimed to investigate which of the cortical areas in the human ventral or dorsal visual pathway that are known to be involved in the processing of disparity information contribute to the integration and perception of global shape from disparity.

To address this question, we examined the relationship between cortical activity and coherent shape perception from horizontal disparity signals by measuring functional magnetic resonance imaging (fMRI) activity concurrently with observers' judgments of shape. We used random-dot displays in which 3D convex shapes were apparent only after global binocular correspondence had been established (Fig. 1). To investigate the cortical areas whose activity related to processing of disparity signals for global shape perception, we degraded the visual stimulus to measure effects on both perceptual discrimination and fMRI response. We parametrically varied shape coherence by shuffling the position of dots whose disparity defined the 3D shape. The advantage of this approach is that it allowed us to degrade the perception of a global 3D shape while keeping the image disparity content constant. Our findings provide novel evidence that activity in both dorsal (V3, hMT+/V5) and ventral (LOC) visual areas varies in correspondence with an observer's perception of shape defined by disparity. These findings are consistent with interactions, rather than a dichotomy, between ventral and dorsal areas that mediate the perception of coherent disparity-defined shapes possibly by supporting a continuum of visual integration and recognition processes from coarse contour segmentation to fine curvature discrimination.

FIG. 1.

FIG. 1.Examples of stimuli used in the study. A: intensity depth maps illustrate the disparity-defined depth structure of the stimuli. Midgray indicates the plane of the screen; brighter pixels illustrate relative position in front of the background plane. Left-hand figure is horizontally symmetric; right-hand figure is vertically symmetric. B and C: example anaglyphic random-dot stereograms at 90% stimulus coherence (B) and 10% stimulus coherence (C).


METHODS

Observers

Ten students from the University of Tübingen participated in the experiment. All had normal/corrected (contact lenses) visual acuity, were able to perceive disparity-specified depth (Stereo Fly test), and were screened for color deficiencies (Ishihara plates). All observers gave informed written consent.

Stimuli

Ten convex shapes defined by horizontal disparity were used as stimuli. Shapes were defined by an average of 200 dots and were embedded in background noise (average number of dots: 850) positioned in the plane of the screen (two exemplars are shown in Fig. 1). Stimuli were presented on a gray background and were 14.42 × 14.42° in size. Individual dots subtended 0.2° and were either black (1.55 cd/m2) or white (415 cd/m2). Disparity was maximal (0.21°) at the center of the shape and fell off smoothly to zero toward the boundaries along both the X and Y directions. The 3D shapes covered 9.64 × 9.64° in the center of the stimulus display. The position of the shapes was jittered across trials to control for local position cues that could facilitate extraction of the shape contour. Half of the shapes were symmetrical along the vertical axis and half along the horizontal axis. Observers indicated (by a button press) whether presented stimuli were symmetrical along the vertical or horizontal axis while maintaining fixation on a central target. Nonius fixation targets were provided to promote maintained eye vergence to the fixation target (in the plane of the screen). The symmetry task ensured that the observers attended to the whole shape, rather than the local elements.

We manipulated the coherence of the 3D shapes by randomly swapping the position of disparity-defined stimulus dots with background dots. This shuffling manipulation was used to create eight conditions in which different percentages of stimulus dots were exchanged to vary shape coherence. Shapes were most coherent (“90% coherence”) when only 10% of dots were swapped and 90% of stimulus dots were in their original positions. We examined conditions of varying shape coherence (either 10, 20, 30, 40, 50, 70, 80, or 90% of stimulus dots were in their original positions). Our pilot experiments showed that these conditions were sufficient to estimate observers' threshold performance; 0, 60, and 100% coherence levels were not tested because our pilot data showed that observers' performance at these levels was very similar to nearby coherence levels. Shuffling dot positions across the whole stimulus rather than only within the shape boundary ensured that observers could not perform the shape discrimination based on a disparity-defined region whose contours would be constant across coherence levels.

Stimuli were rear-projected onto a screen inside the scanner bore and viewed through a mirror 10 cm above the eyes (viewing distance: 78 cm). Dichoptic presentation was effected by red–green anaglyphs (cross talk only 0.6%), encouraging a natural vergence posture. Photometric measures of the red and green signals from the NEC GT950 video projector were used for gamma correction and to equate the luminance of the images presented to each eye.

Design

Psychophysical and fMRI data were collected concurrently from each observer for each condition in eight event-related scans (coherence levels were intermixed on individual scans). Each scan consisted of one experimental trial epoch and two 8-s fixation epochs (one at start; one at end). The experimental trial epoch in each scan consisted of 125 trials in total: 25 trials from each of four of the eight conditions and 25 fixation trials (providing a measure of baseline activity). Different coherence levels were presented across scans: stimuli of 10, 30, 50, and 80% coherence were presented in half of the scans and stimuli of 20, 40, 70, and 90% coherence were presented in the remaining scans. This grouping of coherence levels across scans ensured that both low and high coherence levels were presented in each scan and resulted in scans that were both fully balanced for trial history (two trials back) and not too long for the participants (6.25 min in duration). The 60% stimulus coherence level was not tested because pilot testing showed that this level was distant from any of the individual observer's threshold performance, consistent with the behavioral results during scanning (Fig. 3A). Presentation order was counterbalanced so that trials from each condition (including fixation) were preceded equally often by trials from other conditions. A new trial began every 3 s and consisted of one stimulus (duration = 300 ms) followed by a blank (2,700 ms).

In addition, we collected data for 10 blocked-design localizer scans used to map regions of interest (ROIs) in each individual observer, consistent with previous studies: two scans for LOC (Welchman et al. 2005), two for hMT+/V5 retinotopy (Huk et al. 2002), one for V3B/KO (Van Oostende et al. 1997), two for the retinotopic areas (Welchman et al. 2005), and four for 3D shape-related parietal areas (Orban et al. 1999, 2006a). The order of the experimental and localizer scans was counterbalanced across observers. For all localizer scans, observers performed a dimming task (i.e., change in luminance) on the fixation point that ensured similar attention across conditions.

Imaging

We used a 3T Siemens scanner (University Clinic, Tübingen, Germany) with gradient echo pulse sequence (TR = 2 s, TE = 40 ms localizers; TR = 1.5 s, TE = 40 ms event-related scans) and an eight-channel head coil. We collected 18 near-coronal slices (5 mm thick; 3 × 3 mm in-plane resolution) covering occipitotemporal, parietal, and part of frontal cortex.

Data analysis

PSYCHOPHYSICAL DATA.

A standard logistic function was used to fit each observer's behavioral responses and obtain estimates of threshold performance

(1)

where γ is the baseline, λ is the offset, β is the intercept, and α is the slope of the psychometric function.

FMRI DATA.

fMRI data were processed using BrainVoyager (Brain Innovation, Maastricht, The Netherlands) and MATLAB (The MathWorks, Natick, MA). Preprocessing of functional data included head-movement correction, high-frequency temporal filtering, and linear trend removal. Two-dimensional functional images were aligned to 3D anatomical data, transformed to standardized Tailarach coordinates, inflated, unfolded, and flattened.

For each observer, we identified ROIs using standard mapping techniques (Fig. 2; Table S1 for Talairach coordinates). This approach allowed us to localize ROIs based on independent data from the data collected during the experimental trials. 3D statistical maps for each ROI were obtained by correlating the fMRI time courses with a reference time course based on hemodynamic response properties. The borders of early retinotopic regions (V1, V2, V3, V3A, VP/V3, V4) were localized using rotating wedge stimuli and eccentricity mapping was achieved using concentric rings, as described in our previous studies (Welchman et al. 2005). The lateral occipital complex (LOC) constituted the voxels in the ventral occipitotemporal cortex showing significantly stronger activation (P < 10−4, corrected) to intact than to scrambled images of objects. Two separate LOC subregions were identified: a posterior lateral occipital (LO) region and an anterior region in the posterior fusiform gyrus (pFs), as described in previous studies (Welchman et al. 2005). The human medial temporal area (hMT+/V5) constituted the contiguous voxels in the ascending limb of the inferior temporal sulcus showing significantly stronger activation (P < 10−4, corrected) to moving than to static low-contrast concentric rings (Tootell et al. 1995; Zeki et al. 1991). We distinguished between medial temporal (MT) and medial superior temporal (MST) subregions using rotating triangular wedge (45°) stimuli rendered by moving dots (Huk et al. 2002). V3B/KO was defined as the set of contiguous voxels anterior to V3A and posterior to hMT+/V5 that showed significantly stronger activation (P < 10−4, corrected) to kinetic boundaries versus transparent motion (Van Oostende et al. 1997). Finally, areas along the intraparietal sulcus (IPS: VIPS, POIPS, DIPSM, DIPSA) and the frontal eye field (FEF) region showed significantly stronger activation (P < 10−4, corrected) to intact than to scrambled 3D shapes (the same 10 shapes used in the experiment) defined by disparity. These areas were identified as 3D shape-related ROIs, in accordance with previous studies (Orban et al. 1999, 2006a).

FIG. 2.

FIG. 2.Regions of interest (ROIs). Functional activation maps for one subject showing regions of interest in the retinotopic cortex (V1, V2, VP/V3, V4, V3, V3A), hMT+/V5, lateral occipital complex (LOC: LO, pFs subregions), V3B/KO, and 3D shape-related areas in the parietal cortex (VIPS, POIPS, DISPM, DIPSA) and frontal eye field (FEF). These regions were defined by the overlap of functional activations based on independent localizer data and anatomical landmarks (see methods; Table S1) and superimposed on flattened cortical surfaces of the right and left hemispheres. Outlines of the regions instead of multiple color maps one for each localizer scan are presented here to illustrate more clearly the locus of these functionally defined regions of interest. Sulci are coded in darker gray than the gyri. Major sulci are labeled: STS, superior temporal sulcus; ITS, inferior temporal sulcus; OTS, occipitotemporal sulcus; CoS, collateral sulcus; IPS, intraparietal sulcus.


As described previously (Welchman et al. 2005), for each ROI and observer we averaged the signal intensity across trials in each condition at each time point and converted these to percentage signal change relative to fixation baseline. fMRI responses for early visual areas were extracted for voxels within the cortical area stimulated by the experimental stimuli, as defined by eccentricity mapping. Example time course data of the average fMRI response across subjects for representative areas (V1, V3, LOC, and hMT+/V5) are shown in Fig. S1a; similar time courses were observed for the remaining areas. Fitting the time course data obtained in each ROI with the hemodynamic response function indicated that peak fMRI responses occurred between 3 and 5 s after trial onset (time-to-peak for every ROI is shown in Fig. S1b). fMRI responses were normalized to control for overall variability across subjects and conditions. This involved subtracting the mean fMRI activity in a scan from the activity in each condition and dividing by the maximum activation within a scanning session. The normalized response between 3 and 5 s after trial onset was averaged and used in further statistical analyses (e.g., regression analysis).

RESULTS

Psychophysical data

During scanning, observers judged whether the viewed disparity-defined shape was symmetrical along the vertical or horizontal axis. We manipulated the coherence of the disparity-defined shape by randomly repositioning dots carrying disparity information about the shape across the display. As the proportion of repositioned dots increased, the shape appeared less coherent and the task became increasingly difficult (Fig. 3A). This was quantified by fitting the behavioral data with a logistic function to estimate the 75% correct discrimination threshold. On average, observers required 32.21 ± 1.68% stimulus coherence to determine reliably whether the shapes were symmetrical along the horizontal or vertical axis (see Table S2 for individual observer thresholds).

FIG. 3.

FIG. 3.Psychophysical and functional magnetic resonance imaging (fMRI) data. A: psychophysical data: accuracy (% correct) data across stimulus coherence levels averaged across observers and fitted with a logistic function to estimate the 75% correct discrimination threshold. Dotted line indicates the average threshold across observers (32.21% stimulus coherence). Error bars indicate SE accuracy across observers. Thresholds for each observer are reported in Table S1. BD: fMRI responses in visual ROIs: average fMRI responses across observers (normalized % signal change from fixation baseline across subjects) for each stimulus coherence level. Error bars indicate SE.


fMRI responses

We examined the average fMRI response evoked by stimuli with different levels of disparity-defined shape coherence in early visual areas (V1, V2, V3, V3A, V3B/KO, VP/V3, V4), higher ventral areas (LOC: LO, pFs), and dorsal visual areas (hMT+/V5: MT, MST) previously implicated in the processing of disparity information in the human brain (i.e., disparity-defined planes: Backus et al. 2001; Gulyas and Roland 1994; Neri et al. 2004; Rutschmann and Greenlee 2004; Tyler et al. 2006; shape contours: Gilaie-Dotan et al. 2002; Kourtzi et al. 2002, 2003; Mendola et al. 1999; or gradients: (Brouwer et al. 2005; Welchman et al. 2005).

Figure 3, BD shows the effect of increasing disparity-defined shape coherence on the fMRI response across cortical areas (V1, V2, ventral, dorsal areas). Responses in higher visual areas hMT+/V5 and LOC (and their subregions) increased as a function of shape coherence, confirmed by significant regressions of fMRI response on coherence in these areas (hMT+/V5: P = 0.001; LOC: P < 0.01). In contrast, responses in the early visual areas did not change significantly as a function of coherence (Table 1), with the exception of area V3 (P < 0.05) and a marginal effect in V3A (P = 0.059). Additional analysis considered fMRI responses across areas based on trials only on which observers' shape-discrimination judgments were correct. This analysis showed a similar pattern of results (Fig. S2, Table S3), suggesting that the lack of significant activations in the remaining early visual areas (V1, V2, VP/V3, V4) did not arise from weak responses for incorrect trials. These results suggest a role for V3 and higher dorsal (hMT+/V5) and ventral (LOC) visual areas in processing disparity signals that underlie the integration and perception of shape from disparity. In contrast, early visual areas, known to be involved in processing disparity, responded similarly at different levels of shape coherence, suggesting local processing of disparity signals that is unaffected by the disparity shuffling manipulation.

TABLE 1. Statistics (regression analysis) relating fMRI responses and shape coherence

ROIRF(1,78)PSlopeIntercept
V10.1311.3520.2480.00040.650
V20.0960.7280.3960.00030.661
V30.2686.0170.0160.00120.521
V3A0.2123.6690.0590.00060.594
V3B/KO0.0450.1270.722−0.00010.639
VP/V30.1812.6520.1070.00050.644
V40.1612.0790.1530.00040.627
LOC0.3188.7480.0040.00090.579
pFs0.2324.4380.0380.00090.545
LO0.3279.3550.0030.00110.567
hMT+/V50.36612.0340.0010.00180.362
MST0.3228.0910.0060.00180.326
MT0.39012.5280.0010.00230.314
VIPS0.65352.1610.000−0.00250.729
POIPS0.61542.5500.000−0.00310.728
DIPSA0.59634.1910.000−0.00320.734
DIPSM0.49222.3230.000−0.00230.704
FEF0.56621.6450.000−0.00320.754

Note responses for LOC and hMT+/V5 reflect the aggregate of their separately listed subregions (pFs, LO and MST, MT, respectively).

To ensure that these findings could not be attributed to differences in attentional demands or eye movements between conditions we conducted control experiments and additional analyses. First, it is unlikely that observers chose to selectively attend to particular conditions because trials were presented in quick succession and were randomly interleaved. Second, an attentional load explanation would predict higher fMRI responses when the discrimination task was hardest, requiring prolonged, focused attention that results in higher fMRI responses. This contrasts with the effects we observed: fMRI responses were highest for the most discriminable stimuli (Fig. 3) that evoked the fastest responses from observers [Fig. 7A; shortest response times: F(7,53) = 15.61, P < 0.001]. Third, it is possible that higher coherence stimuli proved more “interesting” to observers producing increased fMRI responses across the visual areas, consistent with previous studies showing attentional modulation of responses as early as in V1 resulting from target detection (e.g., Kastner et al. 1999; Ress et al. 2000). However, such an explanation is unlikely because we observed differential effects across the visual areas with increasing shape coherence (i.e., increased responses in V3 and higher visual areas but not in V1), rather than nonspecific increases associated with increased attentional allocation. Finally, we conducted a control experiment in which observers (n = 5) performed a demanding fixation task (detection of luminance changes) rather than the shape task to ensure a similar attentional state across experimental conditions. This experiment (Fig. S3) confirmed increased fMRI responses with stimulus coherence in higher visual areas.

To examine possible differences in eye movements across conditions we measured eye movements in five observers while they performed the experiment. Measurements of eye position indicated that eye movements were very small and not systematically different in their number and amplitude across experimental conditions (Fig. S4). The resolution of our MR-compatible eye tracker did not allow us to estimate vergence eye movements. However, nonius fixation targets were presented to promote continued eye alignment. Further, it is unlikely that conditions of high shape coherence would encourage increased vergence fluctuations that could account for increased fMRI responses in these conditions. Finally, previous reports (e.g., Tsao et al. 2003; Welchman et al. 2005) using disparities of magnitude similar to those used in our study showed activity for eye vergence changes in cortical regions (anterior intraparietal sulcus, superior temporal gyrus) beyond the visual areas for which increased responses with shape coherence were observed in our experiment.

Relationship between psychophysical and fMRI responses

Our data show that increasing disparity coherence results in enhanced performance in shape discrimination (psychophysical results) and higher levels of activity in some of the visual areas (fMRI results). Based on the relationships between stimulus coherence and our two dependent measures (psychophysics and fMRI) using between-subjects data, we would expect a relationship between psychophysical and fMRI responses. To test this expectation, we performed regression analyses across visual areas based on the data from individual subjects (Fig. 4). This analysis provided evidence for a relationship between perceived shape from disparity and fMRI activation in V3 and higher (hMT+/V5, LOC) visual areas (Table 2). Specifically, there was evidence of a significant positive relationship between psychophysical and fMRI responses in hMT+/V5 (P < 0.01) and LOC (P = 0.01) but not in the early visual areas with the exception of area V3 (P < 0.05).

FIG. 4.

FIG. 4.Relationship between psychophysical and fMRI data. Linear regression of fMRI response (normalized % signal change from fixation baseline across subjects) on accuracy (% correct) for each ROI. Statistics are reported in Table 2.


TABLE 2. Regression statistics on the correlation of fMRI responses with psychophysics

ROIRF(1,78)PSlopeIntercept
V10.0950.6420.4260.00050.631
V20.1221.0520.3090.00050.631
V30.2384.2050.0440.00180.437
V3A0.1371.3350.2520.00060.573
V3B/KO0.1631.7000.197−0.00060.685
V40.1892.6070.1110.00070.587
VP/V30.1651.9620.1660.00070.609
LOC0.3006.9240.0100.00150.508
pFs0.2313.9480.0500.00150.470
LO0.2605.0890.0270.00140.504
hMT+/V50.3228.1120.0060.00270.237
MST0.2896.3600.0140.00260.167
MT0.38712.3650.0010.00360.094
VIPS0.60740.8850.000−0.00360.834
POIPS0.59137.5790.000−0.00440.878
DIPSA0.50523.9880.000−0.00380.768
DIPSM0.48321.2530.000−0.00350.810
FEF0.47420.3410.000−0.00350.639

An iterative least-squares method for calculating regression coefficients was used because the data were skewed toward higher performance scores in the 2AFC task.

The preceding analysis relies on using a linear model to relate psychophysics and fMRI responses. However, as is evident from the psychometric function (Fig. 3A), there is a nonlinear relationship between stimulus coherence and an observer's ability to discriminate shape defined by disparity. We thus asked whether it was possible to describe the fMRI response using the model obtained from the behavioral data. To this end we scaled the logistic model obtained from the psychophysical data to fit the fMRI data

(2)

where β is the intercept, and α is the slope of the psychometric function. We did this by adjusting the baseline and scaling parameters using a nonlinear least-squares method while constraining the slope (α) and intercept (β) parameters of the model to be those obtained from the psychophysical data. This approach is consistent with previous studies (Buracas et al. 2005; Zenger-Landolt and Heeger 2003) that evaluated the relationship between fMRI responses and contrast or speed discrimination performance in V1 and hMT+/V5. Figure 5 shows the fMRI data across the visual areas fit with the logistic model obtained from the psychophysical data. A Pearson correlation coefficient was used to estimate the goodness of fit of the psychophysical model to the fMRI data. There was good agreement between the psychophysical model and the fMRI response in hMT+/V5 (P < 0.05) and LOC (P = 0.05), but not in the early visual areas (Table 3) with the exception of V3 (P < 0.05). This evidence is confirmatory in suggesting a relationship between processing in V3 and higher visual areas (hMT+/V5, LOC) and the observers' ability to discriminate coherent shapes defined by horizontal disparity rather than processing of local disparity signals in these areas.

FIG. 5.

FIG. 5.Using the psychophysical model to describe the fMRI data. Fits (dotted lines) of the average fMRI response across subjects based on a scaled logistic model obtained from the psychophysical data. Different plots and symbols correspond to different ROIs. Error bars indicate SE. Statistics on the relationship between the model and the data are reported in Table 3.


TABLE 3. Pearson correlation statistics on the prediction of fMRI responses from the model fit to the psychophysical data

ROIRPScaleBaseline
V10.3940.3350.0430.631
V20.3900.3400.0240.659
V30.7150.0460.1210.475
V3A0.5230.1830.0430.589
V3B/KO0.6030.113−0.0330.656
VP/V30.6170.1030.0300.657
V40.5380.1690.0220.630
LOC0.6940.0560.0780.570
pFs0.6600.0750.0870.512
LO0.7640.0280.0840.569
hMT+/V50.7770.0230.1330.357
MST0.8850.0030.1340.313
MT0.9430.0000.1660.324
VIPS0.8590.006−0.2060.747
POIPS0.9040.002−0.2670.754
DIPSM0.8130.014−0.1830.695
DIPSA0.8470.008−0.2350.731
FEF0.8640.006−0.2530.773

Relationship between behavioral and fMRI responses beyond the visual cortex

Recent imaging studies have revealed a network of parietal and frontal regions beyond the occipitotemporal cortex that respond to 3D shape information (Orban et al. 1999, 2006a; Peuskens et al. 2004; Sakata et al. 2005; Sereno et al. 2002; Shikata et al. 2001; Taira et al. 2001; Tsao et al. 2003; Vanduffel et al. 2002). We identified these areas in individual subjects by comparing activations to 3D shapes defined by disparity (100% coherence) with random-dot displays (0% coherence) based on an independent data set collected in a localizer scan (Fig. 2; see methods). Analysis of the fMRI responses for the shape coherence experiment provided evidence for a relationship between shape coherence and fMRI activity in these areas (VIPS, POIPS, DIPSM, DIPSA, FEF). However, the relationship was opposite in sign to that observed in areas V3, hMT+/V5, and LOC. Specifically, activity in parietal areas and FEF decreased as the coherence of the stimulus increased (Fig. 6A, Table 1). Regression analyses confirmed significant negative relationships between the observers' ability to discriminate shape and fMRI responses in these areas (Fig. 6B; Table 2).

FIG. 6.

FIG. 6.fMRI responses in 3D shape-related areas beyond the visual cortex. A: average fMRI responses across observers (normalized % signal change from fixation baseline) for each stimulus coherence level and each region of interest in the parietal cortex and FEF. Error bars indicate SE. B: linear correlation between fMRI responses and accuracy (% correct) for each ROI. Regression statistics are reported in Table 1.


Why might activity in parietal areas and FEF differ from responses in areas earlier in the processing hierarchy known to be involved in disparity processing? One possibility is that the representation of object shape within these higher areas is sparse, relying on the activity of a small number of highly selective neurons. At high levels of stimulus coherence, the average fMRI response across voxels might be low because the presented shape would be nonoptimal for the majority of cells across the population, exciting only a small number of selective cells. However, degrading stimulus coherence and thereby increasing the ambiguity of the presented shape could result in an increase in the population response because many more neurons could entertain the possibility that the presented stimulus reflects the class of shape to which they are tuned. This increase in nonselective firing activity across the population would thus mask the shape selectivity of a small number of neurons within the population (Scannell and Young 1999) whose activity may be behaviorally important. Because the fMRI response reflects population activity, single-unit electrophysiology combined with behavioral assessments of shape would be required to evaluate this possibility.

A further possibility is that activity in these areas relates to the demanding nature of the task performed. An analysis of the time taken for observers to make their perceptual decision indicated that observers responded faster when shape coherence was higher and the shape's symmetry more readily appreciated (Fig. 7A). We observed a significant positive relationship between response time and activity in parietal areas and FEF (Fig. 7B; Table 4), suggesting that activity in these areas reflects the difficulty of the subject's task. This finding appears intriguing in that these regions showed significantly stronger activation to coherent disparity-defined shapes than random disparity fields when the observers performed a fixation-dimming task during a localizer scan. However, activity in these areas during the visual discrimination of shapes at different coherence levels appears more closely matched to the perceptual decision made by the observers and the task demands. A further possibility is that this task-dependent effect results from increases in peripheral clusters of dots carrying disparities that become more salient with shuffling. This could enhance activation in these cortical regions that are known to be engaged in the processing of salient targets (Claeys et al. 2003). In contrast to these task-related effects in parietal areas and the FEF, we observed a significant negative relationship between fMRI responses and observers' response times in V3 and hMT+/V5 and no relationship in the LOC subregions (Fig. S5, Table 4). These results provide additional evidence that the activity in these areas relates to disparity processing for coherent shape perception, rather than nonspecific attentional modulation of activity arising from task difficulty.

FIG. 7.

FIG. 7.Relationship between response time and fMRI responses. A: average time taken by observers to press the response button according to their perceptual decision for each stimulus coherence level. Error bars show SE across observers. B: linear correlation between fMRI response (normalized % signal change from fixation baseline across subjects) and observers' response times (seconds) in 3D shape-related areas beyond the visual cortex (VIPS, POIPS, DIPSM, DISPA, FEF). Statistics are reported in Table 4.


TABLE 4. Regression statistics on the correlation of fMRI responses with response time

ROIRF(1,78)PSlopeIntercept
V10.12541.24520.2679−0.06690.7281
V20.07440.43470.5117−0.03410.7033
V30.22294.07960.0468−0.17680.7342
V3A0.09880.76830.3834−0.04670.6624
V3B/KO0.00210.12940.72030.01950.6164
VP/V30.22454.13950.0453−0.09420.7273
V40.23024.36340.0400−0.10500.7591
LOC0.16762.25410.1373−0.10960.6851
pFs0.15872.01570.1597−0.08600.7000
LO0.15862.01380.1599−0.09180.7003
hMT+/V50.29677.53070.0075−0.22580.6151
MST0.22624.20750.0436−0.26340.6474
MT0.26836.05030.0161−0.25400.6683
VIPS0.529627.28200.00000.34750.3004
POIPS0.453718.14800.00010.39050.2323
DIPSA0.571330.03600.00000.53310.0970
DIPSM0.398913.24900.00050.32740.3000
FEF0.565321.60500.00000.57350.0581

Estimating the presented stimulus from fMRI activity

Further analysis of the psychophysical data revealed that observers were better at discriminating disparity-defined shapes with a vertical rather than horizontal symmetry axis (Fig. 8A), consistent with previous psychophysical studies (Wagemans 1997). We examined whether these differences in the observers' performance were reflected by differences in fMRI responses. In particular, we asked whether it is possible to determine whether a disparity-defined shape with a horizontal or vertical symmetry axis was presented to observers based only on the fMRI response. To answer this question we calculated a receiver operating characteristic (ROC) curve (cf. Britten et al. 1992) based on distributions of the fMRI responses evoked by disparity-defined shapes with a vertical, compared with a horizontal, axis of symmetry. For each observer we computed the average fMRI response (% signal change) for each trial from voxels (n = 50) in each ROI that showed the most significant activation to the stimulus compared with the fixation baseline. We compared the distributions of fMRI responses across trials evoked by vertically and horizontally symmetrical shapes in these stimulus-selective voxels (the same 50 voxels were used to calculate the response to horizontal and vertical stimuli). To construct the ROC curve, we calculated the probability of obtaining an fMRI response at or above some criterion level of activity for each distribution and then plotted these probabilities against each other for a range of criterion values. The area under the resulting curve provides an estimate of the relative selectivity of the region for a 3D shape with a horizontal or vertical symmetry axis, where 0.5 represents chance performance (total overlap of the distributions) and a value of 1.0, perfect performance (total separation of the distributions). To assess the statistical significance of the calculated ROC values (95% confidence intervals), we used a permutation test based on the distribution of ROC values obtained when the data were repeatedly shuffled (4,000 iterations) to disrupt the correlation between the presented stimulus and the fMRI response.

FIG. 8.

FIG. 8.Receiver operating characteristic (ROC) analysis. A: performance (% correct) for discrimination of disparity-defined shapes with a horizontal or vertical axis of symmetry. Stimulus coherence at the 75% threshold was 30.25 ± 1.43% for vertically symmetric 3D shapes and 36.75 ± 2.20% for horizontally symmetric 3D shapes. Error bars indicate SE across observers. B: response times (seconds) for shape judgments separated by the axis of symmetry of the stimulus. Error bars indicate SE across observers. C: ROC values for V1, V2, ventral (VP/V3, V4, LOC), and dorsal (V3, V3A, V3B/KO, hMT+/V5) visual areas that provide an estimate of the relative selectivity of each region for 3D shapes of vertical vs. horizontal symmetry. An ROC value of 0.5 represents chance performance; ROC values >0.5 indicate stronger fMRI responses for vertically symmetric disparity-defined shapes; values <0.5 indicate stronger fMRI responses for horizontally symmetric stimuli. Response in each cortical area corresponds to the mean response of the 50 voxels that showed the most significant response to the stimulus, as revealed by the contrast between fMRI activity at all levels of stimulus coherence compared with the fixation condition (P < 0.001). Distributions used to calculate the ROC value consisted of fMRI responses from individual trials from all subjects. When aggregating data across subjects we first normalized the magnitude fMRI response to remove variability associated with between-subjects effects and multiple scanning sessions. Error bars indicate 95% confidence intervals for the ROC value based on bootstrap estimates provided by a permutation test (4,000 iterations).


Based on the observers' psychophysical responses, we were interested in two comparisons where a difference between the fMRI response distributions for disparity-defined shapes of horizontal or vertical symmetry might be expected. Considering observers' accuracy performance (% correct) above chance suggested a difference between responses to disparity-defined shapes of vertical and horizontal symmetries at the 40% stimulus coherence level (Fig. 8A). Considering observers' performance based on response time suggested a difference between performance at 70% stimulus coherence (Fig. 8B). Calculated ROC values were consistent with performance differences between disparity-defined shapes of horizontal and vertical symmetries based on percentage correct, but not on the basis of response times (Fig. 8C). Specifically, we observed ROC values significantly different from those expected by chance in higher dorsal (V3B/KO: P = 0.013; hMT+/V5: P = 0.019) and ventral (VP/V3: P = 0.023; LOC: P = 0.012) visual areas at the 40% stimulus coherence level. It is unlikely that this result is the result of a nonstimulus-specific response effect (e.g., attention) because ROC values were not significantly different from chance in the 70% coherence condition where response times, but not percentage correct, differed between horizontally and vertically symmetric shapes. Finally, it is unlikely that the difference in fMRI response to horizontally and vertically symmetric shapes was the result of low-level stimulus features: an analysis of dot density in our stimuli showed no systematic difference between the distribution of dots across stimuli with different axes of symmetry or variation in dot density across coherence levels (Fig. S6).

A possible limitation of this analysis is that the magnitude of the observed ROC values is considerably lower than that typically seen with electrophysiology (e.g., Britten et al. 1992; Dodd et al. 2001; Krug et al. 2004; Uka and DeAngelis 2004). This potentially reflects the spatial resolution of fMRI where voxel responses accord to activity in large neural populations that may respond heterogeneously to different stimulus types. Despite this limitation, the conservative statistical validation of the data suggests reliably higher fMRI responses for disparity-defined shapes of vertical than of horizontal symmetry at the 40% coherence level, consistent with behavioral performance and previous imaging studies (Sasaki et al. 2005). Further, a recent physiological study (Liu and Newsome 2006) calculated ROC values based on measurements of local field potential (LFP) signals, thought to contribute to the fMRI response (Logothetis et al. 2001). These ROC values were of similar magnitude to those observed in our study and reliably reflected trial-to-trial associations between behavioral judgments and neural activity. Our ROC analysis provides additional evidence for a link between coherent shape perception from disparity and activity in higher visual areas (hMT+/V5, LOC): activity in these areas predicts differences in shape symmetry consistent with performance differences. Interestingly, the ROC analysis showed significant values in V3B/KO and VP/V3 (marginal effect in V4: P = 0.061), suggesting that differences in stimulus symmetry can be predicted by activity in these areas. This is consistent with previous studies implicating these areas in symmetry processing (Sasaki et al. 2005).

DISCUSSION

Our findings provide novel evidence that cortical activity in both ventral and dorsal visual areas relates to the visual discrimination of disparity-defined shapes. In particular, linear correlations and modeling fMRI responses using psychophysical data showed that fMRI activity in area V3 and higher visual areas (hMT+/V5, LOC) related to performance in shape judgments. These correlational analyses were corroborated by an ROC analysis that showed fMRI activity differences that were consistent with differences in the observers' ability to identify vertically, as opposed to horizontally, symmetric 3D shapes.

These findings advance our understanding of the neural correlates of 3D shape perception in the human brain by extending beyond the localization of regions involved in 3D shape processing to investigate the relationship between activity in these areas and the perception of coherent disparity-defined shape. In particular, previous imaging studies have identified multiple areas in the visual, temporal, and parieral cortex that show stronger activations for stimuli defined by binocular or monocular depth cues than those of two-dimensional versions of these stimuli (for reviews see Neri 2005; Orban et al. 2006a,b; Parker 2004). Further, previous studies have used parametric manipulations to investigate the neural correlates of surface depth judgments (Backus et al. 2001; Gilaie-Dotan et al. 2002). Our study is the first to use a stimulus coherence manipulation to investigate whether disparity-defined shape judgments correlate with cortical activity in the human brain. Similar stimulus manipulations were used in psychophysical (e.g., Harris and Parker 1992) and physiological studies (e.g., Uka and DeAngelis 2003) investigating the estimation of surface depth position (i.e., near vs. far) as well as numerous studies on motion coherence (e.g., Britten et al. 1992; Rees et al. 2000). Parametrically varying the coherence of 3D shapes allowed us to investigate the effects of stimulus degradation on both shape judgments and cortical activity as measured by fMRI.

Disparity processing and 3D shape perception

Estimating an object's shape based on disparity signals requires a cascade of processing that starts with the local registration of corresponding visual features and proceeds through global constraints on feature matching, the extraction of information about changes in local disparity signals (gradients), and more regional registration of the rate of change of disparity gradient (curvature). Early neurophysiological recordings revealed selectivity for binocular disparity at multiple levels of the visual hierarchy: V1 (Barlow et al. 1967; Poggio and Fischer 1977), V2 (Thomas et al. 2002; von der Heydt et al. 2000), V3 (Adams and Zeki 2001; Felleman and Van Essen 1987; Poggio et al. 1988), V4 (Hegde and Van Essen 2005; Hinkle and Connor 2002, 2005; Tanabe et al. 2004; Watanabe et al. 2002), MT/V5 (DeAngelis and Newsome 1999; Krug et al. 2004; Palanca and DeAngelis 2003; Uka and DeAngelis 2004, 2006), MST (Eifuku and Wurtz 1999; Takemura et al. 2002), IT (Janssen et al. 2000, 2001, 2003; Liu et al. 2004; Tanaka et al. 2001; Uka et al. 2000, 2005), and in regions of the parietal (caudal intraparietal sulcus) cortex (Taira et al. 2000; Tsutsui et al. 2002). Complementary evidence from human brain imaging implicated several areas across the visual, object-related, motion-related, and parietal cortex in the processing of disparity information (for reviews see Neri 2005; Orban et al. 2006a,b). Further, several recent studies suggest that areas involved in disparity processing primarily in the temporal and parietal cortex are also engaged in the processing of monocular cues to depth (e.g., texture, motion, shading) (James et al. 2002; Kourtzi et al. 2003; Liu et al. 2004; Murray et al. 2003; Orban et al. 2006a; Peuskens et al. 2004; Sakata et al. 2005; Sereno et al. 2002; Shikata et al. 2001; Taira et al. 2001; Tsutsui et al. 2002; Vanduffel et al. 2002).

How is activity at these multiple stages of visual processing related to the perception of shape derived from disparity signals? Neurophysiological evidence suggests that primary visual cortex does not respond to global constraints on correspondence (Cumming and Parker 1997, 2000), whereas this computation appears partially solved in V4 (Tanabe et al. 2004) and predominantly so in IT (Janssen et al. 2003). Selectivity for disparity-defined gradients is evident in MT/V5 (Nguyenkim and DeAngelis 2003), IT (Liu et al. 2004), and CIP (Tsutsui et al. 2002) and responses to shapes defined by curvature are found in IT (Janssen et al. 2000, 2001). Reviews of these studies point to the anatomy-based dichotomy of the visual pathways suggesting that the ventral visual pathway computes 3D shape information for fine discriminations (e.g., object curvature) important for the recognition of objects and their parts whereas the dorsal visual pathway computes 3D scene layout important for spatial positioning and interactions with objects (for reviews see Neri 2005; Orban et al. 2006b; Parker 2004; Tyler 1990).

Our findings provide evidence for a link between disparity processing and coherent shape perception not only in ventral areas (LOC) implicated in shape analysis but also in dorsal areas (V3, hMT+/V5) thought to be involved in the analysis of spatial structure. This multisite activity is consistent with the shape task we used relying on multiple stages of visual analysis that include integrating local disparity signals to extract extended surface patches, segmenting the bounding contours of the object, and registering shape curvature. The shuffling manipulation we used disrupted the continuity of disparity across the scene and likely interfered with these processes, potentially accounting for the reduced fMRI activity we observed as shape coherence was reduced. The advantage of this manipulation is that it disrupted both contours and curvature to ensure that observers could not perform the shape discrimination based on the contour of a disparity-defined region constant across all stimulus coherence levels.

Recent neurophysiological evidence points to a distinction between the processing of coarse and fine disparity information in the cortical hierarchy. Specifically, MT neurons have a sensitivity to coarse disparity information (surfaces located near vs. far) that is sufficient to account for behavioral performance (Uka and DeAngelis 2003, 2004) and microstimulation of MT during near–far disparity discrimination biases behavioral choice (DeAngelis et al. 1998). In contrast, microstimulation of neurons in MT has little effect on behavioral performance when monkeys are engaged in fine judgments of relative depth position (Uka and DeAngelis 2006). The symmetry task we used relies on information about global structure, rather than fine depth position. In line with neurophysiological evidence we find evidence for a relationship between behavioral measures and cortical activity in hMT+/V5. Our current data do not allow us to determine whether the same relationship would hold for fine depth discriminations.

Responses in early visual areas were not found to vary significantly with stimulus coherence. This is compatible with a local analysis of disparity signals that is unaffected by the disruption to surface structure caused by the shuffling manipulation or with neural responses that are not related to the perceptual task (cf. Cumming and Parker 1997, 2000). Is it possible that our stimuli were not optimal for neurons in the early visual areas leading to a null effect? Specifically, the population sensitivity to disparity is known to be slightly biased toward crossed disparity in monkey MT/V5 (DeAngelis and Uka 2003) and V4 (Hinkle and Connor 2005), whereas it is less biased and closer to a peak sensitivity at zero binocular disparity in V1 (Prince et al. 2002). If the presented stimulus fell outside the signaling capacities of neurons in early visual cortices, we would not expect to see an effect of the coherence manipulation even though this might be observed in these areas for a different set of stimuli. We view this possibility as unlikely. First, the population sensitivity to disparity in monkey V1 is sufficiently broad that disparities in the range 0 to −0.21° should evoke a robust response close to the central tendency of the population response (Prince et al. 2002). Second, robust human fMRI responses selective to binocular disparity have been observed in areas V1 and V2 for stimuli containing disparities exceeding those used in our study (Backus et al. 2001). Finally, it has been reported that the population sensitivity to disparity declines slightly with eccentricity (Prince et al. 2002). Were it the case that the near disparities in our stimuli did not evoke optimal responses in V1, we might expect that the fMRI response would increase as stimulus coherence decreased as the result of a larger proportion of zero-disparity dots in the center of the visual field. In contrast, fMRI responses in V1 did not vary with stimulus coherence.

Interestingly, area V3 rather than earlier visual areas (V1, V2) appeared sensitive to the coherence manipulation, suggesting sensitivity to more global properties of the stimulus for the integration of shape contours or extended surface regions. This suggestion is consistent with the disparity columnar organization of V3 (Adams and Zeki 2001) similar to that observed in MT (DeAngelis and Newsome 1999), its input from both the magno- and the parvo-cellular pathways (Callaway 1998), and the strong anatomical connectivity between V3 and MT (Van Essen et al. 1986). It is possible that the activity we observed in dorsal areas (V3, hMT+/V5) reflects the role of these areas in pooling disparity signals and extracting 3D shape contours and regions important for the coarse segmentation and detection of 3D targets in complex scenes. Further finer analysis of 3D shape curvature may proceed in temporal areas to support identification of complex 3D objects and their parts.

In contrast, previous studies implicated areas V3A (Backus et al. 2001; Gulyas and Roland 1994; Mendola et al. 1999; Tsao et al. 2003) and V3B/KO (Brouwer et al. 2005; Tyler et al. 2006; Van Oostende et al. 1997; Zeki et al. 2003) in the analysis of disparity-defined surfaces and boundaries. Further, caudal intaparietal (CIP) regions were previously implicated in processing 3D object orientation and surface slant (James et al. 2002; Naganuma et al. 2005; Sakata et al. 2005; Taira et al. 2000) rather than being involved in form discrimination (Shikata et al. 2001). The anatomical connectivity (Adams and Zeki 2001; Nakamura et al. 2001) established between areas V3A, V3B/KO, CIP, and anterior intraparietal areas involved in object grasping (Orban et al. 2006a; Sakata et al. 2005) suggest a role of disparity-selective mechanisms in object-related actions rather than visual recognition tasks. Consistent with these studies, activations in areas V3A and V3B/KO observed in our study did not correlate significantly with the observers' performance in discriminating disparity-defined shapes, while activations in parietal areas were consistent with task demands. Taken together, these findings suggest a coarse to fine analysis of spatial surface properties from dorsal visual areas V3A, V3B/KO to the parietal cortex important for computing distances in complex layouts and preparing for actions toward objects.

In conclusion, our results suggest that the visual discrimination of disparity-defined shapes recruits both ventral and dorsal visual areas that mediate a continuum of shape extraction and recognition processes from coarse contour segmentation to fine curvature discrimination. Although our study was not designed to discern these processes (i.e., 3D shape contour extraction vs. 3D curvature analysis) and their contribution to the visual discrimination of 3D shapes, our findings are consistent with proposed interactions (Orban et al. 2006b; Peuskens et al. 2004) rather than a strict anatomical dichotomy of function between the visual pathways for 3D shape processing. Our study did not set out to investigate functional interactions between the pathways. Nevertheless, our findings are in agreement with accumulating evidence for form processing in higher dorsal visual areas (i.e., hMT+/V5) (e.g., Kourtzi et al. 2002; Krekelberg et al. 2003; Malonek et al. 1994; Peuskens et al. 2004). Further imaging and neurophysiological studies are necessary to elucidate the cortical dynamics including feedforward and recurrent interactions between areas in this disparity-selective neural circuit that determine its functionality and critical importance for successful interactions in the complex three-dimensional environment we inhabit.

GRANTS

This work was supported by the Max Planck Society; the Graduate School of Neural and Behavioral Sciences, Tübingen; and a Biotechnology and Biological Sciences Research Council grant (BB/C520620) to A. E. Welchman.

FOOTNOTES

  • The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

We thank H. Bridge, G. Brouwer, G. DeAngelis, G. Orban, A. Parker, R. van Ee, and J. Wagemans for helpful comments and suggestions on the manuscript and A. Vatakis and C. Altmann for help with the stimulus design.

REFERENCES

AUTHOR NOTES

  • Address for reprint requests and other correspondence: A. Welchman, School of Psychology, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK

Supplemental data