Primate Extrastriate Cortical Area MST: A Gateway between Sensation and Cognition.

Primate visual cortex consists of dozens of distinct brain areas, each providing a highly specialized component to the sophisticated task of encoding the incoming sensory information and creating a representation of our visual environment that underlies our perception and action. One such area is the medial superior temporal cortex (MST), a motion-sensitive, direction-selective part of the primate visual cortex. It receives most of its input from the middle temporal (MT) area, but MST cells have larger receptive fields and respond to more complex motion patterns. The finding that MST cells are tuned for optic flow patterns has led to the suggestion that the area plays an important role in the perception of self-motion. This hypothesis has received further support from studies showing that some MST cells also respond selectively to vestibular cues. Furthermore, the area is part of a network that controls the planning and execution of smooth pursuit eye movements and its activity is modulated by cognitive factors, such as attention and working memory. This review of more than 90 studies focuses on providing clarity of the heterogeneous findings on MST in the macaque cortex and its putative homolog in the human cortex. From this analysis of the unique anatomical and functional position in the hierarchy of areas and processing steps in primate visual cortex, MST emerges as a gateway between perception, cognition, and action planning. Given this pivotal role, this area represents an ideal model system for the transition from sensation to cognition.


INTRODUCTION
Primate cortex consists of well over 100 different areas that can be differentiated on anatomical as well as physiological grounds (1)(2)(3)(4)(5). Around one-third of these areas in the human cortex and as much as half of them in the macaque cortex contribute to the processing of visual sensory information (6,7). These visual areas are highly interconnected and form a rich network of feedforward and feedback connections between "lower" and "higher" areas. Once visual information coming from the lateral geniculate nucleus (LGN) has arrived in layer 4C of the six-layered primary visual cortex (V1), a hierarchy of brain areas can be determined based on the cortical layers from which projections originate and in which they terminate. Feedforward projections start in superficial layers of the lower area and terminate in layer 4 of the higher area, whereas feedback connections project from deep and superficial areas in the higher area to layers outside of layer 4 in the lower area (1,8). As one ascends this visual hierarchy, what is represented by the activity within different areas shifts from a representation of low-level features of the two-dimensional (2-D) retinal image ("sensation") to a high-level interpretation of the multidimensional environment and the organism's relationship to it ("cognition") (9).
Areas at the intersection of sensation and cognition are at the heart of a fundamental question of neuroscience: how can the brain extract information about the environment to create an internal representation and subsequently guide behavior? One such area is the medial superior temporal (MST) of the macaque cortex (see ANATOMY OF THE MEDIAL SUPERIOR TEMPORAL AREA for anatomical location and connection to other areas). It is predominantly a visual area that processes information about complex motion patterns, which is typically described in terms of its cells' tuning for features of visual motion, such as direction and speed (see VISUAL RESPONSE PROPERTIES OF MST CELLS). But it also uses this information to determine the direction in which the organism is currently moving (THE ROLE OF MST IN SELF-MOTION PERCEPTION BASED ON OPTIC FLOW). Furthermore, it integrates vestibular cues with visual information to improve this representation of self-motion and to tell it apart from the motion of objects in the environment (VESTIBULAR TUNING AND MULTISENSORY INTEGRATION). However, activity in MST neurons reflects not only the integration of sensory input but is also modulated by oculomotor information (MODULATION OF MST ACTIVITY BY EYE MOVEMENTS) and cognitive processes such as attention or working memory (MODULATION OF MST ACTIVITY BY COGNITIVE PROCESSES). Thus, MST is an ideal model system to study the selectivity of individual neurons for complex stimuli, multisensory integration, and sensory-motor transformations and how these processes are modulated by behavioral signals. Although this information comes from studies of nonhuman primates, there is overwhelming evidence that the human brain contains an anatomically and functionally homolog area (HUMAN HOMOLOGS OF MST).

ANATOMY OF THE MEDIAL SUPERIOR TEMPORAL AREA
In the macaque cortex, the MST is located "medial to [the middle temporal area (MT)], along the fundus of the superior temporal sulcus and in places extending several millimeters onto the anterior bank" (8) (Fig. 1). An anatomical landmark is a densely myelinated zone (DMZ) that serves to mark the area's border on the upper bank of the superior temporal sulcus (STS) (10,11). Using anterograde and retrograde tracers, Maunsell and Van Essen (8) showed that connections from MT ended primarily in layer IV of MST, which constitutes a feedforward projection (see Fig. 2 for a visualization of MST connectivity). The reciprocal connection originated mostly in layers V and VI of MST, which is consistent with a typical feedback projection and is in line with the connectivity patterns of other areas in the hierarchy of extrastriate visual cortex. Boussaoud and colleagues (10) established many additional cortical connections of MST and the adjacent and highly interconnected fundus of the superior temporal sulcus (FST): MST receives input from hierarchically lower areas V1, V2, V3, the parieto-occipital area (PO), the dorsal prelunate area (DP), and MT. MST has reciprocal connections with area V6 (12), which overlaps with PO (13), and with the dorsal and ventral subdivisions of area V6A (14,15). Further reciprocal connections with areas that rank on the same or a similar hierarchical level as MST itself include the ventral and lateral intraparietal areas (VIP and LIP), and it also receives feedback from hierarchically higher areas such as the frontal eye field (FEF; 16) and parts of the inferior parietal lobule (IPL) that have traditionally been referred to as visual area 7a (17) and more recently been specified as the "Opt" field and, to a lesser degree, the "PG" field of the IPL (18). All of these connections are reciprocal: MST sends feedback to the areas from which it receives input and it forward its output to the areas from which it receives feedback [see Felleman and Van Essen (1), Tables 3, 5, and 7 for an overview of all connections, hierarchical constraints, and lists of references).
A study investigating the subcortical connections of MST found reciprocal connections with the pulvinar, the reticular nucleus of the thalamus, and the claustrum. In addition, Boussaoud et al. (19) found that there were nonreciprocal connections from MST to the striatum and the pontine nuclei. Curiously, the same study reported that injection of anterograde tracers in MST did not show any label in the superior colliculus (SC), which is known to play an important role in eye movements and attention (20,21) and is well connected to MT [e.g., Maunsell and Van Essen (8)].
Apart from the DMZ that marks the area's border on the upper bank of the STS, MST's boundaries are not sharply defined, which has led to different ways of segmenting the area into subsections. Boussaoud et al. (10,19) differentiate between a more posterior part where cells have receptive fields (RFs) close to the center of the visual field (MSTc) and a more anterior part with cells whose RFs cover the periphery (MSTp). An alternative scheme for dividing MST has been proposed by Wurtz and colleagues, referring to a dorsal-medial subsection on the anterior bank of the STS as MSTd and a lateral-anterior part on the floor of the posterior bank of the STS as MSTl (23)(24)(25)(26). The counterpart to MSTd has sometimes been labeled the ventral part of MST (MSTv) (27)(28)(29). Using a variety of different stains, Lewis and Van Essen (30) differentiate three zones within MST: a dorsal anterior zone (MSTda) that corresponds to what Desimone and Ungerleider (31) called the DMZ, a dorsal posterior zone (MSTdp) that is located posterior and medial to MT, and a medial zone (MSTm). As of this writing in 2021, a consensus has not yet been reached on how to name the subsections of MST, however the majority of the literature focuses on MSTd. For this review, we will use the naming convention of the respective original publication.
Of the 94 empirical research studies reviewed in the first six sections, 69 used rhesus monkeys (Macaca mulatta), 18 were conducted with cynomolgus monkeys (M. fascicularis), 6 with Japanese monkeys (M. fuscata), and 4 with Southern pig-tailed macaques (M. nemestrina). Note that some studies used multiple species (e.g., one rhesus and one cynomolgus) and some do not specify the species, but only state "monkey" or "macaque." 2) MST is well connected to many other cortical and subcortical areas, but receives most of its input from MT. 3) MST can be divided into at least two subareas; although the nomenclature differs across laboratories, the most common one is a division into MSTd and MSTl (dorsal and lateral parts of the area).

VISUAL RESPONSE PROPERTIES OF MST CELLS Comparison to MT
Given that MST was originally defined as the main projection target of the middle temporal area (MT), it is not surprising that the two areas share many similarities. MT is a small region located at the posterior bank of the STS and it contains a large number of direction-and disparity-selective neurons that are retinotopically organized, have RFs which are $10 times larger than those of V1 neurons but, like V1 neurons' RFs, increase in size with eccentricity [22, 32; see Born and Bradley (33)

for a review].
Early studies confirm that MST neurons are also directionselective, albeit with much larger RFs than MT neurons, often covering substantial parts of the contralateral visual field (31,34). Some respond only to movements of individual luminance bars and not to the movement of wide dot patterns covering a large part of the screen ("figure type cells"). Others show the opposite pattern of responses ("field type cells") or they responded equally well to both types of stimuli (34). Presumably, figure-type cells play a role in detecting the difference between the movements of an object and its background or even in perceiving the boundaries of the object with respect to its environment. Field-type cells, on the other hand, which are absent in MT, are involved in the perception of motion of a large part of the visual field, irrespective of individual, smaller objects within that field, such as the motion of the background as one moves through the environment (34).

Distinct Cell Types for Different Motion Patterns
However, the most striking difference to MT is that MST contains neurons, which selectively respond to much more complex motion patterns than movement along a straight line. Saito and colleagues (35) were the first to report three distinct types of cells in MSTd: cells that are similar to MT cells and respond preferentially to unidirectional straight movement ("D cells," around 50% of MSTd neurons); cells that respond selectively to radial motion, that is, an expanding or contracting stimulus ("S cells" for "size change," around 16% of MSTd neurons); and cells that respond selectively to clockwise or counterclockwise rotation in the frontoparallel plane or in depth ("R cells," around 14% of MSTd neurons), leaving 20% unclassified cells. S and R cells are almost exclusively found in the dorsal part of MST whereas the ventral part contains neurons that prefer linear motion of a smaller stimulus and are presumably more relevant for the perception of object motion (29). This has led Tanaka and colleagues to suggest that the ventral and dorsal parts are functionally distinct subregions of MST.
The selectivity for radial motion in MSTd remains of note since these types of motion patterns are also experienced as one moves through the environment, a phenomenon known as "optic flow" (36). Importantly, MT cells do not show selectivity for optic flow stimuli (37), suggesting that this property is generated de novo in MST. How MST is involved in the perception of one's own translational movement through the world will be reviewed in detail in the following two sections. For the remainder of this section we focus on general visual response properties, i.e., the neural responses to passively viewed visual stimuli.

Tuning in Spiral Space Instead of Distinct Cell Types
The idea of distinct types of cells (35,38) responding selectively to radial, rotational, or translational motion was challenged by Graziano and colleagues (39). Inspired by the finding that MSTd cells often respond not only to radial, rotational, or translational motion, but also to two or all three types (23), they hypothesized that these cells might in fact prefer an intermediate form of motion. They defined a continuous circular spiral motion space in which expansion, contraction, clockwise rotation, and counterclockwise rotation can be thought of as the cardinal directions. Intermediate spiral motion patterns can be thought of as a combination of rotational and radial components (e.g., adding a clockwise rotational component to an expanding motion pattern creates an outward clockwise spiral motion pattern, see x-axis in Fig. 3 for 8 evenly spaced directions in spiral space). They found that indeed a large majority of neurons had Gaussian tuning curves in spiral space (see Fig. 3 for an example), similar to the direction and orientation tuning curves typically found for MT and V1 neurons. A similar study confirmed the selective responses to radial, rotational, and spiral motion and additionally found that almost no MST neurons were selective for deformation (37), thus providing further evidence for a tuning in spiral space. The preferred directions of all tested neurons in Graziano et al. (39) covered the whole range of directions in spiral space but with a clear bias for stimuli containing an expanding component. Again, this speaks for a role of MSTd in the perception of self-motion, as a forward movement through the environment results in an optic flow pattern that is dominated by an expanding component (explored more in depth in VISUAL RESPONSE PROPERTIES OF MST CELLS and THE ROLE OF MST IN SELF-MOTION PERCEPTION BASED ON OPTIC FLOW). This tuning is independent of the exact shape of the stimulus, that is, the preferred direction in spiral space is the same for random dot patterns (RDPs) and filled or empty squares (40).
These results suggest that MSTd contains a population of cells tuned to spiral motion directions with their respective preferred directions distributed in spiral space, similar to linear motion preferences in earlier visual areas. There is no evidence for the alternative hypothesis of three distinct subpopulations of cells that decompose complex stimuli into radial, rotational, and translational components. This raises the question whether cells in MSTd are topographically organized according to their preferred direction, similar to the orientation columns in V1 (41) and direction columns in MT (42). Indeed, both electrophysiological recordings (37,43) and 2-deoxyglucose labeling (44) indicate that neurons tuned to similar directions in spiral space are clustered in columns in MSTd.
Receptive Fields: Size, Shape, and Structure A recurring theme of this review is that MST neurons generally show more variability and less structure compared with lower areas, such as V1 and MT. In particular, whereas cells in the lateral geniculate nucleus (LGN) and primary visual cortex (V1) are often described as filters that perform relatively simple operations on the visual input (45), this does not hold true for MST, as will be discussed in more detail in the following sections.
An antagonistic center-surround structure, which is typically observed in RFs of earlier areas and still present to some degree in MT (46,47), does not seem to be present in MST. Instead, the observation that large stimuli ($40 in diameter) are necessary to evoke a MSTd neuron's maximal response suggests that most cells simply spatially sum across their entire RFs (29,(37)(38)(39). Some studies do report a decrease in firing rate for stimuli that exceed a certain size in some MST cells (29,37,48) and others describe excitatory and inhibitory "zones" of MST neuronal RFs (49). Komatsu and Wurtz (25) found a reversal in preferred direction in some MST neurons once the stimulus exceeded a critical size, but the preferred direction of a small stimulus did not reverse across different locations within the RF. Instead, the reversal seems to depend on spatial summation over the total RF area that is stimulated. In conclusion, the evidence suggests that any inhibitory mechanisms in MST RFs could provide some form of gain control but do not follow the classical antagonistic center-surround structure.
Whereas the size and structure of RFs in early visual areas is well described by simple models, such as a difference-of-Gaussians for LGN cells (50), RFs of MST neurons are not only larger, but also more variable in their shape. Fitting receptive fields to a two-dimensional Gaussian showed MST RFs to be more elliptical or at least less regular than those of MT neurons (51). Note that forcing RFs into a predefined shape (such as a 2-D Gaussian) means that some irregularities in the RF shape get automatically smoothed out. Thus, more elliptical fits could also be a sign that the RF shapes are generally more irregular in MST than in MT.
The relation between RF eccentricity and RF size in the MST is weaker (24,31,34,38,51,52) than that found in the neurons of lower areas [see, e.g., Fig. 1 of Freeman and Simoncelli (53)]. Tanaka et al. (29) even found a negative relation between size and eccentricity for MSTd and a positive one for MSTv. What is generally agreed upon, however, is that RF sizes in MSTd are much bigger than in MSTl/MSTv (54).
Lastly, electrophysiological recordings of individual cells have not confirmed a well-structured retinotopic organization  in MST, as is documented for V1 and MT in MST. A number of studies report at least crude visual topography (24,29,31) whereas others specifically mention that they found no topography at all (34,35). More recent functional magnetic resonance imaging (fMRI) studies, however, do provide evidence for a cluster of retinotopic visual field maps in the posterior section of the STS, one of which can be attributed to MSTv, based on its anatomical location (27). Acute single cell recordings and functional imaging both have advantages and disadvantages when it comes to determining the structure of brain regions in visual cortex: the former offers fine-grained information about individual units but samples randomly from the area with high variability between individual recording sessions. The latter can measure the activity across the entire brain within a single recording session, but the spatial resolution is limited as each fMRI voxel represents the blood-oxygenlevel-dependent (BOLD) response that is associated with the activity of thousands of neurons, potentially averaging out the variability within this population. The groups of Tsao and Freiwald have had remarkable success in functionally dissecting the inferotemporal cortex by combining fMRI with electrophysiological recordings (55)(56)(57) and similar methods may be necessary to get a better understanding of the exact structure of MST. Of course, when the receptive fields of single neurons cover as much as a quarter of the visual field, one cannot expect a tessellation of the visual field that is as apparent as it is in earlier areas. At least for those neurons in MSTd with large RFs, the question of whether there is a retinotopic organization or not seems futile.
In summary, the size, shape, arrangement, and structure of MST neurons' receptive fields cannot be adequately described by simple models or linear relationships. However, recent work looking at more complex models of nonlinear integration of the input that MST receives from MT has shown promising results (58; see Position Invariance and Receptive Field Organization for details) and work along those lines may be helpful in the future.

Speed Selectivity
Speed is an integral parameter of motion and how MST neurons respond to different speeds provides important insights about spatial integration of inputs and how complex motion pattern selectivities arise. Specifically, a major question is whether MST neurons simply integrate over local speed and direction patterns or respond selectively to the overall, global motion patterns inside their RFs. Most cells increase their firing rate with speed until a maximum response is reached and then saturate. A few cells, however, are truly tuned for speed variation; they decrease their firing rate once the stimulus exceeds their preferred speed (38,(59)(60)(61). As speed increases, response latency typically decreases in MST and is a bit lower than in MT (37). There is evidence for spatial integration of speed distribution: rotational stimuli normally have a speed gradient, as points on the outer edge of a stimulus need to cover a larger distance to make one full rotation than points close to the center. One study reports that removing this gradient has little effect on the neuron's selectivity, suggesting that it is the average speed across the RF, rather than the exact distribution of speeds, that determines its selectivity (61). Such a purely spatial speed integration across the RF might be too simplistic, though, as another study did find substantial changes in the responses of up to two thirds of their recorded neurons when removing the gradient (60). The fact that MT neurons, which provide the main input to MST, are tuned for speed gradients (62,63), also makes it likely MST makes use of the additional information about structure that is provided by such gradients. This is one example for the more general question of whether MST neurons analyze motion by parceling complex motion patterns out into smaller, more elementary units, or whether they process them as a unified whole. The contradicting results show that this question has not been fully solved yet and we consider it in more detail in Position Invariance and Receptive Field Organization in the context of models of receptive field organization.

Disparity Selectivity
Similar to MT (64), a large proportion of neurons in MST is disparity selective (65). In other words, their response to a stimulus moving in the preferred direction with the preferred speed (as determined for 0 disparity) varies depending on whether the stimulus is closer (negative disparity, "near cells"), farther (positive disparity, "far cells"), or at the same distance (zero disparity) as the fixation point (the "horopter"). Interestingly, Roy and colleagues (65) report that most cells prefer a nonzero disparity, that is, only a minority prefers stimuli along the horopter, with the numbers of "near" and "far" cells being approximately equal. In around 40% of the investigated cells, the preferred direction switched to the opposite when disparity switched from positive to negative and vice versa ("disparity-dependent direction-selective" or DDD cells). This feature contributes to determining one's own direction from the motion of stationary objects on the retina caused by the viewer's self-motion (66).

Position Invariance and Receptive Field Organization
A compelling feature of MSTd cells, first described by Saito et al. (35), is that of "position invariance" (37,39,49): the observation that a neuron's preferred complex motion pattern does not invert, even when local linear motions are inverted (see Fig. 4A for an illustration). To test this, Graziano and colleagues (39) presented stimuli in up to five carefully selected locations within the RF, arranged in an overlapping cloverleaf (to create reversals in local direction) (Fig. 4A), to test for preference inversions at these locations. Figure 4 shows two cells that preferred clockwise over counterclockwise rotation (Fig. 4B) or expansion over contraction (Fig. 4C) and retained this preference polarity in all five locations. Graziano and colleagues (39) found that most responses recorded from MST neurons were position invariant. Using a similar approach, Lagae et al. (37) found only 40% of their recorded MST neurons to be position invariant; however, they report that nearly all their position invariant cells were located in MSTd, where Graziano and colleagues had also recorded most of their cells, suggesting that position invariance is a dominant (and possibly unique) feature of the dorsal part of MST.
Position invariance is highly informative concerning one, if not the, most intriguing issue about MST neurons: how can their specific selectivities be generated from their input?
The general question of how selectivity for particular features arises has been at the heart of visual neuroscience at least as Hubel and Wiesel (41) proposed their model of how orientation selectivity in V1 can arise from LGN input. Applying this approach to MT and MST would suggest that several MT cells with properly aligned receptive field locations and preferred directions project onto a single MST neuron which is then selective for that particular arrangement of linear directions (Fig. 5). This would be in line with what Duffy and Wurtz (49) dubbed the "direction mosaic hypothesis," suggesting that a MST neuron's RF consists of properly aligned subfields with translational direction preferences (presumably these subfields would be identical with the RFs of MT neurons projecting onto the MST neuron). And indeed it has been shown that the spatial arrangement of direction components is the most important factor in determining MST selectivity (67). However, this idea is incompatible with the local direction reversals that come with position invariance ( Fig. 4A): it would mean that for any part of the RF, input needs to be provided by multiple MT neurons with different preferred directions. If all of them have excitatory projections to the same MST neuron, the MST neuron would not be selective for one particular direction in spiral space anymore. Several suggestions have been made to address this issue. A "compartment model" divides an MST neuron's RF into overlapping compartments, each of which is constructed from similarly organized MT inputs independently of the other compartments. MT cells whose input creates one such compartment all project onto one branch of a dendrite of the MST cell so that each dendritic branch can be described as a subunit whose activity represents one compartment of the cell's RF (35).
The alternative "overlapping gradient hypothesis" posits that the RF consists of excitatory and inhibitory response gradients. The particular arrangement of excitatory and inhibitory gradients with different preferred directions is claimed to account for the selectivities for complex motion patterns (49). With a similar idea in mind, Mineault et al. (58) devised a promising model that can account for the response patterns of a heterogeneous population of MST neurons. They developed a continuous optic flow stimulus that consisted of randomly evolving combinations of translational, spiral, and deformational motion and fitted a neuron's response to this pattern to a number of different models, which they then used to predict the neuron's response to new stimuli. Not surprisingly, a simple linear receptive field model that compares the stimulus to an internal template was not successful in accounting for the more complex response properties of MST neurons. Instead, a hierarchical model in which an MST neuron linearly integrates the input of several subunits with properties similar to those of MT neurons was able to describe some cells very well, but across the population the predicted responses often deviated substantially from recorded responses. A third model, finally, where the input of the subunits was transformed by a static nonlinear operation with just one free parameter before integration (Fig. 6), resulted in remarkably good fits to the data. Such a nonlinear operation could be implemented biologically through inhibitory interactions among MT neurons or synaptic depression between MT and MST. The model found between 2 and 45 subunits for each MST neuron, which were mostly excitatory and often had overlapping RFs. It is, of course, likely that each MST neuron receives many more projections from MT than that, but the model convincingly shows an architecture that can explain many of the features of MST neurons.
We would like to reemphasize that the focus of our review lies on studies that include physiological recordings. As is apparent from the examples introduced in this section, there is also a multitude of pure modeling studies that explore this issue (68)(69)(70)(71)(72), which exceed the scope of this review.

Relation of MST Activity to Perception and Behavior
The last question we want to address in this first section is how the activity of MST neurons is related to motion perception on the behavioral level. Two measures based on Signal Detection Theory have been developed to compare behavioral performance in simple discriminations tasks to the responses of single neurons: first, constructing a "neurometric" curve (73) allows to compute a discrimination threshold for each neuron, which can be compared with the psychophysical threshold of the monkey. Second, choice probability (CP) [74; see Crapse and Basso (75) for a review, but also MST MT Figure 5. A simple architecture to create a clockwise rotation selective receptive field: the receptive fields of MT cells (8 of which are shown here in blue with their preferred directions) are spatially arranged in a way that their preferred directions line up to form a clockwise rotational pattern. If they all project to a single MST neuron (shown in orange) with excitatory synapses, the receptive field of that MST neuron (orange circle) will have a selectivity for clockwise rotation. This model cannot, however, explain position invariant response properties ( Fig. 4 and text). MST, medial superior temporal area; MT, middle temporal area.

stimulus
MST stage integration nonlinearity firing rate ... Cumming and Nienborg (76) and Zaidel et al. (77) for limitations of CP] is a measure of how the activity of individual neurons is related to the monkey's decision on a trial-by-trial basis. For this measure, one compares the distribution of responses from trials where the monkey chose the neuron's preferred feature (here: direction) to the distribution of responses from trials where it chose the antipreferred feature (here: the opposite or null direction) and calculates the probability that a randomly chosen value from the first distribution is higher than a random value from the second one. Thus, CP values indicate the accuracy with which a neuron's response predicts the monkey's choice, with values around 0.5 representing chance performance and values close to 1 representing nearly perfect prediction accuracy. Both of these measures have been developed to investigate the role of MT in a simple two-alternative forced-choice (2AFC) task where the monkey has to report whether a low coherence RDP is moving to the left or to the right (78). This experiment was later repeated to investigate the role of MST in this kind of task (79). The results were typically quite similar for MT and MST: for most cells, neuronal thresholds were very similar to behavioral thresholds, suggesting that an observer could rely on only a very small group of neurons to make its decision. Similarly, CP values were significantly above 0.5 in both areas, but did not differ significantly between areas. Furthermore, microstimulation of MST biased a monkey's choice behavior toward the preferred direction of the cells around the stimulation site (80), again showing a similar pattern of results as MT (81). This suggests that MST does not contribute more or less to this type of simple behavioral task than MT. Interestingly, CP values were not significantly above 0.5 when the monkey had to choose between two opposing directions in spiral space, suggesting that the relationship between spiral motion perception and MST activity is weaker than that between linear motion perception and MST activity (82). In contrast, Williams and colleagues (83) did find a difference in the relation between neural activity and perception between MT, MST, and lateral intraparietal area (LIP): monkeys reported the direction of an apparent motion stimulus, which, on some trials, was constructed so that it could be perceived to move either in a neuron's preferred or its antipreferred direction. Almost half of LIP neurons and 22% of MST neurons, but no MT neurons showed a difference in activity depending on whether the monkey reported the neurons' preferred or antipreferred direction. The authors concluded that neuronal activity becomes more aligned with the subjective perception of apparent motion as one ascends through the three hierarchically organized areas.

MT stage
To investigate whether the contributions of MT and MST are necessary for motion perception, Rudolph and Pasternak (84) lesioned both areas by injecting ibotenic acid. This caused pronounced deficits in motion perception. For stimuli not masked by noise, the monkeys were eventually able to recover some of the impaired perceptual abilities, suggesting that other areas can compensate for the lost function. However, the ability to extract motion signals from noise remained impaired even after extensive training, indicating that MT and MST play an irreplaceable role for challenging motion perception. Lesioning of MT and MST also impaired speed discrimination (85). However, a later study showed that very precise lesions of the STS affecting MT, while leaving MST intact, have similar behavioral effects as larger STS lesions that also affect MST. This suggests that the relation of MST activity to behavior is largely inherited from MT, at least for direction discrimination tasks (86).
Yet a different approach to comparing perception with neuronal activity is to examine neural responses during visual illusions where perception diverges from physical stimulus properties. Neural populations, whose activity parallels the illusory perception rather than the physical stimulus, can be considered to be more closely related to behavior than populations whose activity is only related to the stimulus but not to perception. An example of such an illusion is the "apparent motion" of an RDP whose dots are displaced by a spatial and a temporal separation with successive flashes of each dot. Small temporal separations create "smooth motion," but with increasing temporal (and spatial) separation between flashes (keeping the speed constant), the quality of motion is degraded. Interestingly, observers experience an illusory increase in speed with increasing temporal and spatial separation, but single MT neurons do not parallel this illusion, but rather decrease their firing rates as a function of increasing temporal separation (87). However, averaging the responses of a subset of MST neurons allowed to estimate the speed in a way that maps onto the illusion (59), suggesting that MST activity is closer related to perception in this particular setting than MT activity.
In a similar vein, a recent study provided further support for a direct role of MST in motion perception by showing that a subgroup of MSTd neurons respond to illusory rotational or radial motion (52). This suggests that MSTd might contribute directly to the perception of these illusions.
The exact role that MST plays in motion perception is still not fully understood. The idea that a small, localized set of neurons form the immediate substrate, or "bridge locus" (88) for perception is oversimplified and the neural correlate of any sort of perceptual experience is more likely to be distributed across a number of brain areas (89). In the case of motion perception, this probably includes areas such as V1, V3, MT, and LIP. Existing detailed models of motion processing typically focus on V1 and MT (90,91), proposing that MT extracts motion information from V1 and represents velocity in a way that is invariant to other stimulus features, such as spatial frequency or orientation. MST plays an important role within such a multistage model and might perform additional computations that go beyond MT's focus on simple linear motion and extract the corresponding featureinvariant velocity information (92). Thus, future work on the neural underpinnings of motion perception should embrace the idea of a network of areas, rather than a unidirectional processing pipeline.
This first section focused on MST with regard to features that are typically discussed in other visual areas, such as receptive field size, tuning for direction and speed, or relation to behavior in simple discrimination tasks. Many of these points will be revisited in later sections. MST cells are not just a slightly more complex step in a one-directional processing pipeline and differ from cells in earlier visual areas in a number of important features, namely, 1) highly complex stimulus preferences, 2) no clear retinotopic organization, 3) lack of a suppressive surround structure, and 4) questionable relation of receptive field size to eccentricity.
MST seems to be an interface where physical properties of the environment processed by earlier areas are represented in a way that is more directly linked to perception and subsequent action. In that sense, the MST can be thought of as the dorsal pathway's analog to the inferior temporal (IT) cortex in the ventral pathway (39). In the next section, we discuss how MST's unique combination of response properties makes it a central player for the neural processing of selfmotion information.

Takeaway
1) The receptive fields of MST neurons are larger than those of its main input area MT and can cover as much as half of the entire visual field, predominantly on the contralateral side. 2) MSTd neurons are tuned to motion in "spiral space," that is, they respond preferentially to motion patterns composed of radial and/or rotational directions. 3) Interestingly, this preference for spiral motion is often position invariant, that is, a neuron's preferred complex motion pattern remains the same in different regions of its receptive field, even if the change in stimulus position causes local motion directions to inverse. 4) How MST neurons integrate the input they receive from MT neurons to create such a position invariant selectivity is not well understood. Evidence so far points to a nonlinear integration of MT responses that are tuned to translational motion, but this is an important area for future research, as it could offer fundamental insights into the neural computation underlying complex stimulus preferences. 5) Elucidating the role that MST plays in motion perception remains a central focus of current research. Understanding how MST works in conjunction with other areas of the dorsal pathway, while taking on a special role as the neural underpinning of at least some motion percepts would help to decipher the role of areas at the interface of sensation and cognition.

THE ROLE OF MST IN SELF-MOTION PERCEPTION BASED ON OPTIC FLOW
In both, the ventral and the dorsal visual pathway, the neural representation of the environment gradually shifts from one that 1) primarily reflects low-level stimulus attributes (V1) to 2) a highly specialized representation of selected stimulus features (ventral: V4; dorsal: MT), to 3) a representation that selectively focuses on complex stimulus descriptions of high ecological, social or behavioral relevance (ventral: IT; dorsal: MST). For IT in the ventral stream, prominent examples for species interacting with a complex environment are object recognition (93,94) and, given the rich social life of primates, facial processing (56,(95)(96)(97). What, then, is the equivalent to object representation and face selectivity in the motion domain that leverages the high-level representation in MST?
It must be the kind of motion patterns that primates routinely encounter and that require immediate decision making and complex responses. This is the case for "optic flow," that is, the radial patterns that are projected onto the retina during translation of an observer through the environment (36) and are a major contributor of self-motion perception. Thus, self-motion perception based on optic flow (and separating this self-motion from object motion, which is discussed in VESTIBULAR TUNING AND MULTISENSORY INTEGRATION) can be considered MST's core task, in a similar way as object and facial recognition are IT's core tasks. Correspondingly, just like lesions in IT create specific effects on the perception of faces (98,99) and stimulation of face-selective sites biases categorization of noisy images toward a face category (100), lesions and stimulation of MST would be expected to impair or bias self-motion perception.
Already, the first studies to describe MST neurons' selectivity for radial and rotational motion discussed the possibility that these neurons are involved in the processing of optic flow and the analysis of self-motion (23,35,38). Early psychophysical studies have shown that humans are very good at determining the direction of self-motion from such optic flow patterns, even when the pattern is confounded by eye movements (101, 102). As general mechanisms of self-motion perception have been reviewed comprehensively (103, 104), we will focus here on how MST's response properties make it a key area within the brain for solving this problem.

Tuning for Heading Direction
To specifically test whether MSTd neurons represent the current heading direction, Duffy and Wurtz (105) presented monkeys with radial and rotational stimuli that differed in the location of their center of motion. They found that in most neurons the response varied with the location of the center of motion (Fig. 7) and that the preferred centers of motion were topographically distributed across the visual field [see also Gu et  . This provides strong evidence that the population of MSTd neurons as a whole can encode the position of the center of motion (also called the "focus of expansion," FOE, in the case of expanding stimuli). Thus, presumably, MSTd represents the current direction of heading very well, even if we are not looking where we are heading. Note that variations in firing rate with the location of the center of motion do not contradict the position invariance described in the previous section: position invariance merely states that a neuron's preferred direction stays the same across different locations "within" the RF. In other words, a position-invariant neuron that prefers expansion to contraction will do so in every part of the RF. The absolute firing rate, however, can still vary across locations and thus allow a neuronal population to represent different centers of motion.

Population Coding of Heading Instead of Decomposition of Optic Flow Patterns
It is mathematically possible to decompose any optic flow field into so-called "elementary flow components" (EFC), such as rotation, divergence (i.e., expansion and contraction), or deformation (110,111). Thus, an early hypothesis was that MSTd performs such a decomposition into EFCs to compute heading direction. However, two findings speak against the decomposition hypothesis: neurons which are selective for one EFC decrease their response when their preferred EFC is mixed with another EFC (e.g., clockwise rotation mixed with expansion from outward clockwise spiral motion) (112). Furthermore, as described in the previous section, many MSTd neurons are tuned along a single dimension of spiral motion patterns, rather than representing EFCs (39). If neurons were representing the presence of an EFC, they should respond strongly as long as this EFC is present in the stimulus, even when mixed with other EFCs (112). As an alternative, Lappe and Rauschecker (113,114) suggest a model consisting of two layers of neurons (such as MT and MST) that can represent heading direction through the population response of the output layer. More specifically, their MST-like output layer represents each possible heading direction with a population of neurons, the summed activity of which provides the likelihood that the respective direction is in fact the current heading direction. This model has received strong support from physiological data (107). In conclusion, the available evidence clearly favors such a population encoding of heading direction, rather than individual MSTd neurons computing heading based on a decomposition.

Effects of Microstimulation and Inactivation on Heading Perception
The gold standard for linking neural activity with cognition is to show a causal relationship. Stimulation and inactivation are the methods of choice to document that neural activity is sufficient or necessary for perception or behavior. To test whether altering the activity of MST neurons is sufficient to modulate heading perception, Britten and Van Wezel (115, 116) electrically stimulated the area in monkeys performing a visual heading discrimination task. The monkeys were presented with an optic flow pattern consisting of random dots moving away from a FOE and had to report whether the FOE, which is considered the direction of heading, was to the left or to the right of straight ahead. In a large proportion of experimental sessions, stimulating MST significantly biased the monkeys' reports about their heading perceptions, in some cases by more than 5 . Gu et al. (117) confirmed that microstimulation of MSTd neurons biased behavior in a heading discrimination task and additionally showed that reversible inactivation of MSTd led to strong increases in discrimination thresholds.

Effect of Pursuit Eye Movements on Heading Representation
So far, we have discussed the highly artificial scenario where a monkey keeps its eyes still by fixating one particular point on the screen for an extended period of time. Only in those cases does the FOE correspond to the direction of selfmotion.
Eye movements add linear components to the optic flow field, thus shifting the FOE. For example, moving one's eyes to the right (Fig. 8E) shifts the retinal image to the left and if this is combined with the expanding optic flow that is associated with forward movement (Fig. 8B), it results in an expanding optic flow pattern whose FOE is shifted to the right (Fig. 8, C vs. A).
Psychophysical experiments have shown that humans can account for these eye movement-induced shifts, but require extraretinal information about eye position to do so (118). Since MST also receives and encodes information about eye movements (MODULATION OF MST ACTIVITY BY EYE MOVEMENTS), it is well suited to solve the problem of estimating heading   direction from shifted optic flow patterns. It is possible that the tuning of MSTd neurons for heading in fixating monkeys [e.g., Duffy and Wurtz (105)] could simply represent the position of the FOE on the retina, rather than actual heading. To test this, Bradley and colleagues (119) observed that monkeys either fixate or perform smooth pursuit eye movements while they were presented with expanding RDPs whose FOE position varied along an axis parallel to the neuron's preferred smooth pursuit direction. The authors found different types of cells: "heading cells" showed the same tuning during fixation and pursuit eye movements, whereas "retinal cells" seemed to be responding primarily to the pattern of retinal image motion. Simulating eye movements by adding a linear shift to the RDP while the monkey was fixating made heading cells react like retinal cells: their tuning curves shifted along with the eye movement. This suggests that "heading cells" have access to information about the eye movement and can use this information to adjust for shifts in the retinal image that are caused by the eye movements. A subsequent study (108) tested heading tuning during pursuit eye movements in eight different directions, not just along each neuron's preferred pursuit direction. They found that for most neurons, selectivity for a particular heading direction (as simulated by the location of the FOE) was affected by pursuit eye movements (Fig. 9). From that, Page and Duffy (108) concluded that individual neurons cannot account for heading detection during eye movements and found instead that a population vector across 196 recorded neurons represents heading well, both during fixation and pursuit.
Importantly, the FOE position in head-centered coordinates can be decoded from the population activity even at the single-trial level at an accuracy close to behavioral discrimination thresholds in humans and monkeys (120). This further supports the idea that MST plays a key role in heading perception. However, Bremmer and colleagues (121) reported slightly different findings: they found that about half of their recorded neurons preserved their selectivity for one heading direction across fixation, simulated eye movements, and real eye movements, similar to Bradley   A recent study attempted to elucidate the relative importance of purely visual, retinal signals and extraretinal efference copy signals for the brain's ability to discount the distortions caused by eye movements. Manning and Britten (122) compared tuning for heading direction in three ways: a "normal pursuit," a "simulated pursuit" (during which the monkey fixated, but the stimulus was shifted as if the monkey had made a pursuit eye movement), and a newly developed "stabilized pursuit" (eye movements were compensated by stabilizing the stimulus on the retina based on instantaneous eye velocity). The simulated pursuit condition isolates the effects of retinal signals, as there is no efference copy from the eyes, whereas the stabilized pursuit conditions isolates effects of extraretinal signals, as the eyes are moving but the retinal image does not change. They found that tuning curves shifted very little during stabilized pursuit compared to a fixation condition, which supports the hypothesis that retinal mechanisms alone can explain response stability as these two conditions produce the same retinal image. Furthermore, tuning curves during real and simulated pursuit, which lead to the same, shifted retinal image, were both displaced in a similar manner. All of this suggests that the relative importance of efference copies is rather small compared with the retinal contributions.
The differences between these four studies (108,119,121,122) show that heading selectivity during eye movements depends on a complex interaction of the type and direction of the eye movement, the exact stimulus configuration, and the task at hand. There is strong evidence that a population of MSTd neurons can represent the current direction of selfmotion even as the visual input is disturbed by eye movements. The exact computational mechanisms that render this possible, as well as their neural implementation, remain an active field of research.

Effect of Saccadic Eye Movements on Heading Representation
In addition to pursuit eye movements, everyday vision is characterized by ballistic eye movements, so called saccades, which occur multiple times per second during natural behavior. They pose a challenge for heading representation because motion perception is suppressed around the time of saccades (123,124). Bremmer and colleagues (125) showed that a linear decoder that can accurately determine heading direction from a population of MST and VIP neurons, makes systematic errors when analyzing activity during saccades. The decoded heading direction would be compressed toward straight ahead when analyzing the population activity in the time period from just before saccade onset to around 160 ms after. The authors conducted a psychophysical experiment with human subjects who were presented with a short optic flow stimulus and had to perform a heading discrimination task while making an upward saccade. Just like the decoder, the human observers' judgment was biased toward straight ahead when the optic flow stimulus was presented perisaccadically. This provides strong evidence that the saccadeinduced bias in the decoder is not a peculiarity of the decoding approach, but that the information represented in the population activity is truly impaired.
In conclusion, it is clear that MST, together with other, adjacent areas (such as VIP; 126), plays an essential role in the perception of the direction of self-motion. In fact, this can be considered MST's core task, akin to object recognition in IT, without discounting the possibility that MST may have other core tasks. If it is in fact true, that MST computes and represents one's movement through the environment, it should be able to integrate nonvisual sensory information about self-motion. In the following section, we review evidence that this is indeed the case.

VESTIBULAR TUNING AND MULTISENSORY INTEGRATION
As outlined in the previous section, the response properties of MST make it an ideal candidate for the neural substrate of self-motion perception. A big question remains: does MST encode purely visual signals about self-motion and passes this information on to downstream areas, which integrate it with information from other modalities to represent heading direction or is it the final, integrated representation of heading direction? If the latter is the case, these neurons should be able to encode heading information based on nonvisual, for example, vestibular input. It is by no means obvious that MST might respond to vestibular input, as it had been described as a purely visual area for the first 15 years after its discovery. The literature reviewed in this section show that MST neurons do integrate vestibular information with the visual input. This is further evidence that these neurons reflect an internal representation of the environment that can guide behavior, rather than isolated features of the physical stimulus.
Tuning for Vestibular Input Duffy (127) was the first to test the responses of rhesus monkey MST neurons to optic flow stimuli simulating selfmotion and to real motion, both in darkness and in combination with visual stimulation. The real movement was achieved by means of a motorized sled on which the monkeys could be moved in any direction on a horizontal plane. The study confirmed that MST neurons were selective for heading direction in response to visual optic flow stimuli. More importantly, around one-quarter of the neurons studied were also selective for the direction of a translational movement in darkness, although the responses were typically smaller in magnitude and less selective than responses to optic flow stimuli. Most surprising, however, was that MST neurons varied widely in how they responded to simultaneously presented visual and vestibular input: some neurons were strongly tuned for one modality (visual or vestibular) and only responded weakly to the other modality; in that case the response to bimodal stimulation was typically similar to that of the preferred modality alone. In other neurons, the response to bimodal stimulation was an additive combination of responses to either modality alone. In a third group of neurons, adding vestibular input to visual input suppressed the response to the optic flow. Notably, the preferred directions for translational movement and optic flow were not related and a substantial proportion of neurons altered their directionality in response to noncongruent bimodal stimulation, that is, translational movement in one direction and optic flow simulating movement in another direction.
The  (127) findings, including the fact that the preferred directions for visual and vestibular stimulation often differed, and found an even higher proportion of MSTd neurons responding to vestibular stimulation (64%, compared with 98% responding to visual stimulation). The reason for this higher proportion of neurons responding to vestibular stimuli is probably the 3-D motion, compared with two-dimensional (2-D) motion on a plane in Duffy's (127) study. Neurons that were tuned for vestibular stimulation were usually also tuned for visual stimulation so that there were almost no exclusively vestibular neurons and visual responses tended to dominate over vestibular responses in most bimodal neurons (106,131). Similar to previous findings that MSTd neurons are clustered according to their preferred direction in the visual domain (37,43,44,117), it was found that cells with similar translational or rotational directional preference in the vestibular domain also tended to cluster together (117,132). Preferred directions in both the visual and the vestibular domain are not uniformly distributed, but bimodal with peaks at 90 to the left and to the right of straight ahead (106). Because a neuron is most sensitive at the steepest point of its tuning curve, these neurons with preferred directions to the left or to the right are hypothesized to be better able to encode heading differences that deviate very little from straight ahead (133).

Multisensory Integration on the Neuronal and the Behavioral Level
How is this peculiar set of response properties related to heading behavior? Gu et al. (131,134) trained monkeys to perform a heading discrimination task, similar to the one Britten and Van Wezel (115) used in their stimulation studies: the animals experienced either real or visually simulated forward motion that had a small rightward or leftward component, or a combination of both (Fig. 11), and reported the perceived direction by means of a saccade to the right or to the left. Performance was quantified by constructing psychometric functions (proportion of "rightward" responses as a function of heading direction) and calculating a discrimination threshold, defined as the deviation from straight ahead that was necessary for the animal to provide reliably correct responses. They found that perceptual thresholds were similar for visual and vestibular stimulation alone (between 1.2 and 4.0 ) and improved significantly in the combined condition (134).
Neuronal thresholds, determined as described in the first section, were generally worse than the psychophysical threshold for all three conditions (visual, vestibular, combined), suggesting that the animal relies either on the pooled responses of a large population of neurons or gives more weight to the most sensitive neurons (135).
To establish a functional link between the activity of MSTd neurons and heading perception, Gu et al. (131) calculated choice probability (CP) values of MSTd neurons in a heading discrimination task. They found them to be significantly larger than 0.5 in the vestibular-only condition (131) and the combined condition (134). In addition, behavioral thresholds for a heading discrimination task predicted from the activity of a population of MSTd neurons were in good agreement with actual psychophysical thresholds that were measured under similar conditions (133).
Importantly, however, CP values in the combined condition (134) depended on the congruency between the preferred visual and vestibular directions of the neurons.  and platform motion). In all conditions, the monkey was required to fixate a central target during the stimulus and then saccade to a rightward or leftward target at the end of each trial to indicate its perceived heading relative to straight forward (one interval version) or relative to the first interval (two interval version). The heading depicted in this schematic is straight forward (0 ), and thus there would be no correct answer (monkey was rewarded randomly).  Whereas congruent neurons (same preferred direction for visual and vestibular stimulation) had CP values significantly above 0.5, incongruent cells (different preferred directions for visual and vestibular stimulation) had an average CP value slightly below 0.5. This suggests that the monkey relies more heavily on the congruent cells during its decision making process and at the same time raises the question of what the purpose of the noncongruent cells is. It should be noted that the interpretation of CP values is complex and a more recent study provided evidence that these values are modulated by both sensory and topdown choice-related signals (77).

Congruent and Opposite Cells
The difference between the preferred visual and the preferred vestibular direction in incongruent cells is not random. While Duffy (127) found no evident relation between the preferred visual and vestibular directions, later studies reported that they tended to be either aligned or pointed in opposite direction, thus classifying cells as "congruent," "opposite," and "unclassified" (106,(136)(137)(138). Furthermore, congruent and opposite cells appear to be arranged in clusters within MSTd (132). What then could be the purpose of these "opposite cells," considering that their discrimination thresholds get worse and their relevance for behavior (as quantified by choice probability) decreases when visual and vestibular cues are combined, that is, more information about the environment becomes available (134)?

Importance of Opposite Cells for Dissociating Self-Motion and Object Motion
It turns out that objects moving through the environment and thereby disrupting the full-field optic flow, can bias heading perception, apparently because the visual systems struggles to dissociate self-motion from object motion. Adding vestibular signals to the optic flow reduces this bias (139). Logan and Duffy (140) had already shown that an object that disturbs the optic flow pattern (thus suggesting that it is moving independent from the environment) alters responses in MSTd cells that are sensitive to both, optic flow and object motion. In such situations, where visual input is altered and possibly in conflict with vestibular input, opposite cells could help to decompose the overall input into components due to self-motion and components due to object motion. To test this, Sasaki et al. (137) presented monkeys with the combined visual and vestibular stimulation depicted in Fig. 11. A cluster of nine spheres, defined by increased dot density (the "object") and moving in one of eight possible directions, was added to the optic flow pattern. They found that indeed, adding a moving object to the optic flow pattern altered the joint tuning for self-and object motion of congruent and opposite cells. In congruent cells, heading tuning was more consistent across different directions of object motion in the bimodal than in the visual condition. In other words, if a cell prefers the same direction for visual and vestibular stimulation, then adding vestibular information can counterbalance the disruption in the visual input caused by the moving object. Tuning for the direction of the moving object, however, was more consistent in the visual than in the bimodal condition, meaning that the addition of vestibular information makes it more difficult for the cell to encode the object's direction. For opposite cells, the pattern reversed: heading tuning was more consistent in the visual than in the bimodal condition, but tuning for the direction of the objects was more consistent in the bimodal than in the visual condition. Importantly, they found that a linear decoder provides good estimations of heading direction in the presence of object motion (and vice versa) through an approximation of a type of probabilistic inference. This worked only, however, on a population of mixed, that is, congruent and opposite cells (141) and only in MSTd. In MSTl, on the other hand, there are generally fewer cells showing heading selectivity, fewer cells showing vestibular tuning, cells are less able to discriminate directions, and the effects of object motion on self-motion representation are weaker (54).

Detailed Mechanisms of Visual-Vestibular Integration
After establishing that crossmodal information is brought together, a number of studies have investigated "how" MSTd neurons integrate sensory signals from the two modalities. A number of findings clearly show that visual and vestibular information arrive in MSTd by separate pathways: first, a majority of MSTd neurons are tuned for visual information in an "eye-centered" reference frame, but in a "head-centered" reference frame for vestibular information (136). Second, a bilateral labyrinthectomy eliminates vestibular, but not visual tuning (131,138), which also provides strong evidence that the tuning in absence of visual input is really due to vestibular input and not some unaccounted for additional input (e.g., auditory noise from the motion platform). Third, MT, which provides the major input of visual information to MSTd, does not carry vestibular information (142). Instead, Chen et al. (143) propose a hierarchical processing pathway for vestibular information, based on the temporal dynamics of direction selectivity, which starts at the parietoinsular vestibular cortex (PIVC), and sends information through the ventral intraparietal area (VIP) to MSTd.
But how is the information from these two separate pathways integrated? Morgan et al. (144) presented monkeys with 64 combinations of visual (8 evenly spaced directions) and vestibular (8 evenly spaced directions) cues. They found that a simple linear combination rule where the response in the bimodal condition was the weighted sum of the corresponding visual and vestibular responses, could account very well for the data. Manipulating the reliability of the visual cue, by changing the coherence of the optic flow RDP, showed that the weights assigned to each modality varied with cue reliability: the visual weight increased and the vestibular weight decreased with increasing visual motion coherence. As mentioned before, monkeys' performance in a heading discrimination task improved substantially in the combined condition compared to visual or vestibular stimulation alone. Similarly, the neuronal threshold (as derived from the neurometric curve) for congruent cells also decreased in the combined condition, but it increased for opposite cells (134). Importantly, both the behavioral and the neuronal threshold in congruent cells in the combined condition were very close to the statistically optimal value. An important prediction of optimal multisensory integration is that different modalities are weighted according to their relative reliabilities. Although Morgan et al. (144) had shown that the relative weights can vary with the coherence of the visual stimulus, this occurred outside of a behavioral task and thus they could not directly test whether the reweighting is in accordance with optimal integration. Fetsch and colleagues (145) showed that monkeys performing a heading discrimination task using visual and vestibular cues, are indeed able to weight cues according to reliability in a nearoptimal manner. In a follow-up study (146), they tested whether a population of MSTd neurons can predict the observed behavioral reweighting and what kind of computations needed to be performed by individual multisensory neurons to account for this. Using maximum likelihood decoding (e.g., 147), they converted the population response of all recorded neurons into perceptual choices for every trial and calculated psychometric curves based on these calculated responses. The weighting of visual and vestibular cues by this simulated observer matched the monkey's behavior very well if only congruent cells were decoded, suggesting that this subpopulation of MSTd neurons is indeed where the integration and weight adjustment are happening. As in the study by Morgan et al. (144), the weights of individual neurons varied with coherence and the ratio of vestibular to visual weights was significantly correlated with the statistically optimal weights ratio. More importantly, deviations from optimality matched slight deviations from optimal integration observed on the behavioral level, thus providing further evidence that MSTd is the neural substrate of this integration process. The fact that neurons appeared to be able to change their weights on a trial-by-trial basis indicates that this reweighting does not rely on changes in synaptic weights, as these take place on a slower timescale. A recent model (148) showed that divisive normalization (149) can account for several empirical findings of multisensory integration, for example, that multisensory enhancement is stronger for weak stimuli and decreases with increasing stimulus strength ("inverse effectiveness") and that multisensory enhancement works best if cues from different modalities are congruent in space and time. In the model, multisensory neurons integrate the responses of two primary neurons, which are sensitive to different modalities but have overlapping response, as a weighted sum. The activity of each multisensory neuron is then divided by the net activity of the pool of multisensory neurons. This normalization can account for the changes in weights assigned to visual and vestibular cues that were observed when the reliability of the visual cue was altered by changing the coherence of the dot patterns. The authors hypothesize that this is caused by changes in the activity of the normalization pool, which strongly depends on coherence [see Fetsch et al. (130) for a detailed review].
The crossmodal interaction effects predicted by the model have recently been confirmed through recordings of MSTd neurons (150). Furthermore, the crossmodal suppression effects predicted by the model were similar in strength for both congruent and opposite cells, indicating that divisive normalization is a general principle of multisensory integration in MSTd and not directly related to a particular behavior, such as heading discrimination or dissociation of self-and object motion. An even more convincing finding in favor of divisive normalization is that unisensory neurons that respond only to visual, but not to vestibular cues show slightly suppressed responses to combined visual and vestibular stimulation (150). This can be explained by the model because the normalization pool contains many multisensory neurons and its overall activity is thus influenced by the vestibular stimulation.
This line of research has provided overwhelming evidence that MSTd, which has traditionally been considered a visual area, also processes vestibular information and integrates it with visual cues to represent the current direction of selfmotion, or heading. A diverse population of cells that have the same ("congruent") or different ("opposite") preferred directions for visual and vestibular cues allows to optimize heading discrimination on the one hand and to tell selffrom object motion on the other hand. The weighting of visual and vestibular cues is flexibly adjustable, depending on cue reliabilities, probably by means of divisive normalization. This line of research has established MSTd as an excellent model system for the study of multisensory integration. Thus, MSTd could, for example, be used to study the neural substrates of age-related changes in multisensory integration, as a recent psychophysical study in humans (151) provided evidence that the weighting of vestibular and visual cues changes with age to compensate for sensory deterioration of the vestibular system.
Note that additional brain areas, other than MSTd, also contribute to heading perception and integration of visual and vestibular information, such as VIP (152) or the visual posterior Sylvian area (VPS; 153). We did not discuss these areas extensively, not because we consider them less important, but because our goal here is to review how MST contributes to different functions, such as heading perception, rather than how one specific function is implemented in the brain. The finding that microstimulation and reversible lesioning of MSTd affects behavior in a purely visual task much more than in a vestibular or a multimodal task (117) strongly suggests that other areas can compensate for deficiencies in MSTd. We refer the reader again to Britten (103) for a comprehensive review about self-motion perception and the different brain areas involved in it.

Role of MST in Path Integration and Spatial Cognition
Self-motion and the accompanying change in an organism's position are important aspects of spatial navigation. The neural basis of navigation has been well established in rodents; here, place cells and grid cells in the hippocampus and entorhinal cortex play an important role in the representation of space and one's own position (154). Recent studies suggest a similar representation of space and current position by the entorhinal cortex of macaques (155) and humans (156). One's current position is achieved by previous selfmotion. How does the information about self-motion that is encoded by MST neurons contribute to spatial cognition more generally? There is evidence that at least some MSTd neurons can respond differently to the same heading direction depending on the path on which the monkey is moving: when monkeys are moved with a motorized sled on a circular path, so-called "path selective" neurons show larger responses to the same heading when the sled is moved in either clockwise or counterclockwise directions ( Fig. 12; 157, 158). This is particularly interesting, because a previous study found no evidence for temporal integration when measuring responses to a continuously changing optic flow field (159). Either the vestibular input from actual movement is necessary for path integration on the single cell level or path integration only occurs during heading sequences that represent a natural path, which might not have been the case for the artificial setting of a changing optic flow field (157). This suggests that MSTd could be part of a larger navigation and spatial cognition network. Direct connections from the superior temporal gyrus to the entorhinal cortex (160) and a functional MRI study in humans (161) provide additional evidence for such a network.

1) In addition to visual motion, a subset of MST neurons
also responds selectively to vestibular input that provides information about self-motion. 2) Cells that respond to both visual and vestibular information can be "congruent," meaning that they prefer the same heading direction for visual and vestibular cues, or "opposite," meaning that the two preferred heading directions differs by roughly 180 .

3) A population of mixed, that is, congruent and opposite
cells in MSTd appears to play an important role in dissociating self-motion from object motion.

MODULATION OF MST ACTIVITY BY EYE MOVEMENTS
In primates and other species with forward-facing eyes and foveal vision systems, perception of motion and eye movements are tightly coupled: When an object that we are looking at moves, we have to move our eyes to keep foveating it. How is activity in MST related to these eye movements? Is it modulated by eye movements, and if so: how? What role does it play for the planning or execution of eye movements or both? We already noted that MST neurons representing heading direction can compensate for eye movements, which suggests that they have some information about eye position. The question is whether the contribution of MST to eye movements is mostly perceptual, delivering information about the motion of objects to other areas that compute the actual action plan? Or is MST itself involved in the computation of an action plan?
The first studies of how eye movements are related to the activity of MT and MST neurons occurred in the context of basic motion perception: Newsome et al. (26) wanted to test whether MT is necessary for the cortical analysis of visual motion. They did this by showing that small, chemical lesions of MT affected a monkey's ability to follow a moving target ("smooth pursuit eye movement") as well as its ability to saccade toward a moving target, but not the ability to saccade to a stationary target. Similar lesioning of MST also suggests that a lesion impairs estimation of the moving stimulus' speed and affects targets in the visual hemifield contralateral to the lesioned hemisphere ("retinotopic deficit") or targets moving toward the visual field ipsilateral to the lesioned hemisphere ("directional deficit") (162).
These results suggest that the information about the stimulus' motion represented in MT and MST is necessary for the planning of eye movements toward moving targets, but not for computing action plans in general, as saccades to stationary targets were unimpaired. Komatsu and Wurtz (24) found that about a third of MT and MST neurons are active during pursuit of a small target in an otherwise dark room, so-called "pursuit cells." These pursuit cells also show direction-selective responses to visual motion stimuli during foveation and their RFs typically include the fovea, which is to be expected, since pursuit eye movements will normally start from a point that is currently being foveated. The preferred direction of a moving large-field RDP is opposite to the preferred pursuits direction in a majority of MST pursuit cells (25), which is not surprising, given that as one moves one's eye in one direction (e.g., to the right), the retinal image moves in the opposite direction (in the example, to the left). To test whether this pursuit signal comes from visual or extraretinal (e.g., corollary discharge or proprioceptive sources) input, Newsome et al. (26) turned off the pursuit target for a short time interval (the "blink") during the eye movement. Although most pursuit cells in MT and some in MSTl show a reduction in firing rate during the blink, suggesting that their pursuit response can be accounted for by visual stimulation from the pursuit target, pursuit cells in MSTd and the remaining cells in MSTl keep their firing rates at a high level during the blink, indicating another, extraretinal source of the signal. In both cases, however, the pursuit-related discharge typically started after pursuit onset, indicating that the activity is not causally involved in pursuit initiation, but rather involved in maintenance of the eye movement. Similar results were reported by Ilg and Thier (163) who found that MST, but not MT, pursuit cells responded equally well during pursuit of an "imaginary" target, defined by peripheral cues outside the RF, as during pursuit of a regular target. Microstimulation of MT and MSTl during pursuit movements increases eye velocity when a monkey moves its gaze toward the hemifield that is ipsilateral to the stimulated hemisphere (i.e., leftward pursuit during stimulation of the left hemisphere), but decreases velocity when the eye moves away from the stimulated side (i.e., rightward pursuit during left hemisphere stimulation) (164). However, this effect can still be explained by changes in the visual perception of the monkey and does not prove that MT and MST play a causal role in planning or execution of the eye movement.
To explore the nature of the pursuit-related signal, Ono and Mustari (165) recorded MSTd responses during normal pursuit, during pursuit with target blinks [similar to Newsome et al. (26)], and during vestibular ocular reflex (VOR) in complete darkness. As in previous studies, neurons continued their discharge during the blink, but they were not modulated during VOR. The authors interpret this as evidence that the extraretinal signal reflects smooth pursuit or gaze commands, rather than proprioceptive feedback or efference copies, which should also be affected by VOR. However, an alternative explanation that does not assign a motor role to MST could be that efference copies from motor areas are sent to MST only during volitional movements, but not during more automated eye movements, like the VOR.

Sensitivity to Eye Position
During both fixation and pursuit eye movements, a clear majority of MST single-cell responses to identical stimuli vary with eye position (166). Interestingly, almost all eye position sensitive neurons had their maximum response at eccentric fixation locations. This could be part of a coordinate transformation computation where information about the environment that has been represented in retinocentric coordinates by earlier visual areas is transformed into different frames of reference (e.g., head-, body-, or space-centered) (167). And indeed, Bremmer et al. (168) have shown that eye position information can be extracted with an optimal linear estimator from the activity of neural populations in different parietal areas, including LIP, 7A, and MST. In line with this, Lisberger (169) suggested that MST plays a similar role for sensorimotor transformation during pursuit eye movements as LIP does for saccades and the medial (MIP) and anterior (AIP) intraparietal cortices do for arm movements.
Lisberger's (169) assessment that the exact role of MST in pursuit remains unknown, still holds true. It is nevertheless clear that MST is the first area along the visual processing pathway that plays a central role for the integration of motion perception and eye movement planning.

MODULATION OF MST ACTIVITY BY COGNITIVE PROCESSES
The previous section showed that the activity of MST neurons does not solely reflect processing of sensory (visual and vestibular) input in a bottom-up manner, but is also influenced by internal, extraretinal signals, such as eye movements. It is well established that neuronal activity throughout macaque visual cortex is modulated by cognitive factors, such as attention [see Treue (170) for a review] and working memory [see Pasternak and Greenlee (171) for a review]. In this section, we review how activity in MST is modulated by attention and (working) memory and how this modulation is similar or different to that of other visual areas.

Attention in Visual Cortex
Attention is a mechanism that allows the prioritized processing of some sensory stimuli at the cost of impaired processing of other, nonattended stimuli. Visual attention can be either directed toward a certain location ("spatial attention") or toward a specific feature of a stimulus, such as its shape, color, or direction of motion ("feature-based attention"). The neural signatures of visual attention include changes in firing rates (172,173) and modulated trial-to-trial correlations between neurons (174,175). Another type of brain activity that has been investigated in the context of attention are local field potentials (LFPs), which can represent the synchronization of neural populations across (low LFP frequencies) or within (high LFP frequencies) brain areas (176). Attention increases synchronization of spikes and LFPs in the c frequency band (177), changes the phase-amplitude coupling of different LFP frequency bands (178), and decouples spike times from the phase of specific LFP frequency bands (179).
There is strong evidence from studies in the ventral pathway that most of these topdown modulatory effects depend on feedback connections [see Squire et al. (180) for a review] and preliminary evidence suggests that this is also true for area MT in the dorsal pathway (181,182). MST receives such topdown input from a number of areas in the temporal, parietal, and frontal lobes, including the frontal eye field (FEF) (10). FEF in particular is considered to be a major source of the modulatory activity that is associated with attention (183)(184)(185)(186) and its reciprocal connection with MST makes the latter an excellent model to study different effects of attention.
Given the well-understood sensory characteristics of neurons along the dorsal visual pathway and their midlevel position between early visual cortex and higher association areas, MT and MST in rhesus monkeys have been prime targets for studies assessing the attentional modulation of neuronal responses. In a typical spatial attention paradigm, monkeys are presented with two stimuli, one of which is placed inside the RF of the recorded neuron and the other one at an equal eccentricity, but outside the RF. In a given trial, the animals are cued to attend to one of the two stimuli (the "target, " Fig. 13A). Importantly, the physical stimulus configuration is identical on trials where the stimulus inside the RF is the target ("attend-in") and those on which the target is the stimulus outside the RF ("attend-out"). Thus, any differences in neural activity represent an internal, attentional signal. Feature-based attention, on the other hand, is typically investigated by presenting a neuron's preferred feature (e.g., its preferred direction) inside the RF and cueing the monkey to attend to another stimulus outside the RF that either has the same feature ("attend pref") or the antipreferred feature ("attend anti-pref") (see left panels of Fig.  13, B and C for examples of feature-based attention paradigms with spiral and linear motion). To ensure robust sensory responses, attentional studies in MT and MST employ moving stimuli. Across many such studies a multitude of attentional effects on neuronal responses have been identified and quantified, most notably a carefully orchestrated combination of gain increases for neurons encoding the target stimulus and its features and gain decreases for neurons encoding all other stimuli and features (200)(201)(202). Using such an approach, a study (203) compared the effects of spatial and feature-based attention on linear and spiral motion stimuli for MST neurons tuned to both of these motion patterns (Fig. 13). Although spatial attention enhanced the responses to these two stimuli in the same way (data not shown), these preliminary results showed that feature-based attention only boosted responses to spiral motion stimuli (Fig. 13B), but not to linear motion stimuli (Fig. 13C). The authors suggest that their findings provide evidence that the linear motion selectivity observed in MST is not "inherited" from MT (whose responses are affected by feature-based attention to linear motion; 173) but is generated de novo in MST. The results suggest that spatial and feature-based attention take on complementary roles in MST to combine an unimpeded high gain pass-through processing for sensory information from attended locations in the visual field with an additional feature-based modulation of neuronal responses. This ensures that only those responses to attended features that contribute to perception are boosted by the allocation of attention. This study is the first demonstration of a "loss" of attentional modulation along the cortical hierarchy and of a "selective lack of attentional modulation" for just one of the features a given neuron is tuned for.
All the studies on attention reviewed so far used highly controlled settings to isolate the effects of attention on elementary sensory processing capabilities of the brain. A different approach was taken by Page and Duffy (204), who investigated how the neuronal responses to optic flow and real motion are affected when monkeys are engaged in different tasks that require them to focus on either the visual input, or the vestibular input, or a task that is unrelated to the motion stimuli. Monkeys were presented with circular motion by means of either optic flow, or movement of a sled, or both and had to report either a perturbation in the optic flow, or a perturbation in the sled movement, or an unrelated auditory tone via button press. On the single cell level, MSTd neurons showed a variety of response patterns that depended on stimulus condition (optic flow alone, sled movement alone, or both) and task (attend to optic flow, attend to sled movement, or attend to auditory tone). On the population level responses were diminished when monkeys attended to the optic flow or the sled movement as compared with when they attended to the tone detection task. Although this finding could potentially be explained by the tone detection task being easier, as indicated by faster reaction times, it does suggest that MST responses to heading stimuli are modulated by task demands. It is an open question: can such a reduction modulation induce systematic biases or misjudgments of self-motion, which would be highly relevant, for example, in the context of driving a car?
For a proper understanding of MST, it is crucial to determine what part of the attentional modulation of activity in MST is inherited from MT, and what part is caused by direct projections from higher areas. Given that there are direct connections between frontal regions, such as FEF and MST, it is plausible to assume that at least part of the enhancement comes directly from a topdown signal to MST itself. To resolve this question, one would have to measure MST responses during an attention task while shutting off the modulation of MT neurons by higher areas. This is difficult to achieve as MT and MST lie next to each other in the cortex and most methods that will affect MT are likely to affect MST as well. However, optogenetics can be used to selectively affect individual cells and some preliminary results suggest that it can be used to selectively shut down the influence of FEF on MT (182).

Attention: Reduced Burstiness
The attentional increases in firing rates observed in MST are stronger but qualitatively similar to effects found in other visual areas. Figure 13. Effect of feature-based attention on an medial superior temporal area (MST) neuron. A: behavioral paradigm: the monkeys initiated each trial by directing and maintaining their gaze on a centrally presented fixation point and holding a touch-sensitive lever. After trial initiation a static cue appeared for 67 ms at an eccentric location, cueing the animal to covertly shift attention toward this target location, either within the receptive field (yellow dotted circle added to figure for illustrative purposes) or in the opposite hemifield. The cue presentation was followed by a blank period for 400 ms to measure baseline activity. To reduce transient motion onset responses, random motion, both inside and outside the receptive field, was presented briefly (375 ms) before the onset of coherent motion stimuli (either spiral or linear motion). To obtain a liquid reward, the monkeys had to respond to a transient speed increment (250-2,500 ms after onset) of the target stimulus by releasing the lever, ignoring any speed changes in the distractor. B: stimulus configuration for the feature-based attention condition with spiral motion. Attention was always directed to the stimulus outside the receptive field (opposite hemifield of yellow dotted circle) to either preferred direction (Pref; red) or antipreferred direction (Anti-Pref; blue dotted). Inside the receptive field (yellow dotted circle) the stimulus always moved in the preferred direction to ensure a strong sensory response. The right panel shows an example neuron's spike density and raster plot for responses whereas the target stimulus was moving either in the preferred (red) or antipreferred direction (blue dotted). C: feature-based attention example for the linear motion configuration. This panel is identical to B, except that linear motion stimuli were presented. The right panel shows the neuronal response for the same neuron as shown in B, but for linear motion stimuli. Adapted from Baloni (203).
A recent study found an additional neural signature of attention in MST: spatial but not feature-based attention reduces the occurrence of multiple consecutive action potentials with very short interspike intervals, so-called "bursts" in response to spiral stimuli (205). This was particularly surprising as both spatial and feature-based attention increased firing rates in agreement with previous studies [e.g., Treue and Martinez-Trujillo (173)]. Even though the firing rate enhancement was weaker for feature-based than for spatial attention, this difference could not account for the absence of an effect of feature-based attention on the burstiness of MST neurons. Furthermore, the reduction in burstiness could be dissociated from the increase in firing rates, suggesting that they are caused by two different mechanisms. Additional research is necessary to determine to what extent these results are specific for MST or can be generalized to other visual areas. But they strongly suggest that the detailed mechanisms by which attention modulates activity in sensory areas is complex and possibly relies on more than just unidirectional connections from FEF to the respective area.
An open question that, to our knowledge, has not been addressed at all on the neurophysiological level so far is how attention to visual or vestibular cues can modify multisensory integration and heading computation in MST. A psychophysical study (206) showed that allocating attention across different aspects of the visual input affects object motion perception more than self-motion perception, suggesting that this is a question well worth investigating in more detail.

Working Memory: Influence of Memory Content on Sensory Responses
The ability to keep a limited amount of information available for a few seconds is essential for any type of goaldirected behavior. It is well established that the prefrontal cortex is an important neural substrate for orchestrating task relevant information and holding stimuli in memory [207, 208; see Constantinidis et al. (209), and Lundqvist et al. (210) for a recent debate on the exact mechanisms]. A less clear question, however, is to what extent sensory cortical areas are involved in the short term storage of information [see Pasternak and Greenlee (171) for a review].
An early study recorded the activity of different mid-and high-level visual areas in the ventral and dorsal pathway during a delayed match-to-sample task (211). Monkeys were presented with a sample RDP that was either moving in one of four cardinal directions or did not move but had one of four colors. After a short delay period, a sequence of up to four test RDPs, either moving in different directions or with different colors was shown, and the monkey had to respond to the one that matched the sample in either direction or color. In the condition where direction of motion had to be matched, the responses of MT neurons to the four test stimuli were largely independent of the sample stimulus that was kept in memory. In MST and area 7a, in contrast, directional selectivity for the test stimuli became weaker, but activity was more influenced by sample stimulus kept in memory, suggesting that these areas contribute to the maintenance of direction information.
This study clearly showed that the memory representation of motion information is by no means trivial: areas outside the PFC appear to be involved, but not necessarily in the same way as they are involved in sensory processing.
Working Memory: Necessity of MT/MST Contributions Bisley and Pasternak (212) tried to determine the contributions of MT and MST to working memory by investigating the effects of unilateral lesions on encoding, retention, and retrieval of motion information in a delayed match-to-sample task. They found that monkey performance was impaired, but the exact nature of this impairment depended on the properties of the stimulus and on the task. When presented with a noisy stimulus (a low-coherence RDP) that had to be compared to a coherent stimulus moving in the same or the opposite direction, encoding and retaining was generally impaired by the lesion. This was not the case, however, if the to-be-remembered stimulus contained a strong signal (i.e., a coherent RDP). Comparison of the memorized stimulus with a test was only impaired when the task required the discrimination of similar directions of two coherent RDPs. This suggests that MT/MST is necessary for two tasks: integrating local motion signals across a noisy stimulus for encoding (and possibly retaining) and accurate discrimination of directions. Encoding and retaining of coherent motion for a categorical (left vs. right) task, on the other hand, does not seem to require MT/MST. It is possible that the information can be encoded by direction-selective neurons in earlier visual areas (e.g., V1 and V3) and then be retained as a categorical decision in frontal areas without the involvement of MT/MST. Whether this really is the case could be tested by instructing subjects to maintain either a mental image of the remembered sample or to save the information in an abstract manner (e.g., by verbalizing "up and left"). However, this would be impossible in nonhuman primates.

Working Memory: Activity during the Delay Period
The study by Bisley and Pasternak (212) was limited in its ability to dissociate the roles of MT and MST in a working memory task as it is very difficult to confine the effects of artificial lesions to one single brain area without affecting surrounding areas. Mendoza-Halliday and colleagues (213) attempted to find the exact point along the motion processing stream, where direction-selective memory activity emerged by recording simultaneously from areas MT or MST, and the lateral prefrontal cortex (lPFC). Monkeys performed a delayed match-to-sample task similar to the one by Ferrera et al. (211), where they had to memorize the direction of a sample RDP and, after a short delay report which one of two sequentially presented test RPDs was moving in the same direction (Fig. 14A). Neurons were classified as "sensory selective," or as "delay selective" when their firing rates varied as a function of direction during the sample stimulus presentation or the delay period respectively. As expected, nearly all MT and MST neurons as well as 70% of lPFC neurons were sensory selective (i.e., whereas a sample RDP was shown on the screen, see Fig. 14, B, D, and F, for example neurons). However, only a third of MST and half of lPFC neurons, but hardly any MT neurons were delay selective (i.e., their firing rate varied with memorized direction in the absence of a visible stimulus, see Fig. 14, C and E, for example neurons). Furthermore, some MST and lPFC neurons showed strong delay selectivity but weak or no sensory selectivity for direction, suggesting that there is a subpopulation of neurons in these two areas that is primarily concerned with representing a memorized direction rather than the direction of a present stimulus. The delay selectivity in MST and lPFC neurons was also linked to task performance. This result suggests that sustained activity during a short delay period arises quite suddenly between two brain areas that are very strongly connected. Further research will be necessary to investigate how such a pattern can emerge in a presumably small and highly interconnected network. An open question is whether aspects of motion other than direction, such as speed, are also maintained in MST, but not in MT.
As was the case for attention, working memory for vestibular information has also been investigated much less than that for visual information and there are, to our knowledge, no studies on the involvement of MST in vestibular memory. Takeaway 1) Like in most other areas of the primate visual cortex, firing rates of MST neurons are increased when monkeys direct spatial attention to stimuli inside the neurons' receptive fields. 2) In contrast to the typical effects of spatial attention, featurebased attention was found to only affect neuronal responses to spiral, but not linear motion, suggesting that these two motion preferences play different functional roles. 3) Spatial, but not feature-based attention was found to reduce the occurrence of "bursts" of multiple action potentials with very short interspike intervals.

4)
Stimulus-selective activity during the delay period of a memory task appears to be absent in MT, but to emerge in MST, suggesting that this is an important change in the representation of motion from MT to MST.

5)
The complex integration of visual information with vestibular signals and their modulation by memory and attentional influences performed in MST underscores this area's potential as an ideal model system and future research focus for the transition from sensation to cognition.

HUMAN HOMOLOGS OF MST
A main reason why the macaque monkey is such a suitable model organism for cortical information processing is that its brain structure is very similar to that of humans (7). Compared with rodent species, nonhuman primates are more similar to humans in terms of behavior (e.g., the coherence thresholds for motion perception are 2-3 times worse in rats and mice than in primates, 214), anatomy (e.g., forward-facing eyes that allow for binocular processing or the multilevel processing pathway for motion processing V1-MT-MST that has no equivalent in the rodent brain) and physiology (e.g., a larger part of motion processing occurring in the retina and V1 in rodents, as compared with primates; 215). Findings about the nonhuman primate brain are therefore thought to be more directly relevant for an understanding of the human brain (216)(217)(218). In this section, we review the evidence that the human cortex contains a homolog to macaque cortical MST. The interspecies similarities that we describe below underscore the relevance of understanding macaque MST for our understanding of human vision and cognition.  E). Each neuron's preferred direction is shown in red. Gray area shows the corresponding area under the ROC curve (auROC) over time (right axis label). In C, the test stimuli, but not the sample, were placed inside the neuron's receptive field, and colors during the test period represent test directions. MST, medial superior temporal area; MT, middle temporal area.

Psychophysics
Visual psychophysics in healthy human subjects can be used to determine processing "channels" (219,220) or "detectors" (221) for specific visual features, such as orientation, luminance, or motion direction. Although it is not always straightforward to exactly map such channels onto neural structures, they clearly suggest specialized modules in the human brain that underlie the processing of these features. The existence of such channels can be demonstrated by showing that performance in visual detection or discrimination tasks depends critically on individual features of a stimulus. For example, a classic study showing that adaptation depends on the spatial frequency of a stimulus suggests that there are processing channels in the human visual system that are selective for spatial frequency (222). In a similar way, Regan and Beverley (223) provided evidence for "looming detectors," that is, processing channels in the visual system that selectively respond to changes in size (which can be described as expansion and contraction), separate from motion information. It was not clear, however, that their adaptation paradigm probes a putative human homolog of MST, rather than earlier areas, such as V1 or MT Fn2 2 (224). Therefore, Morrone and colleagues (225) tested the integration across local motion signals by measuring coherence thresholds in a direction discrimination task. They showed that motion sensitivity increases with the area of a random dot pattern, suggesting a processing channel that integrates signals across the visual field. Importantly, for circular or radial motion, such a spatial summation cannot be explained by channels that are tuned to local linear motion, as different subsections contain motion in orthogonal or even opposing directions. Instead, there must be a neural mechanism that integrates different motion signals across sectors (a putative MST-like channel). Furthermore, sensitivity is lower when sectors of the RDP that contain no signal dots are filled with noise dots as compared with when they are empty, providing additional evidence that this integrating mechanism sums signals across the entire stimulus. In a second experiment, contrast was varied for the entire stimulus whereas coherence was held constant. Only a small effect of stimulus area on contrast sensitivity was observed, suggesting that no summation takes place and sensitivity is limited by an earlier stage with a contrast threshold (e.g., V1). A follow-up study (226) using annuli confirmed the summation across large regions of space (as far as 72 ) and provided evidence that the integrator mechanism relies on neurons with large receptive fields.
Based on these results, Morrone, Burr, and colleagues suggest a two stage process of complex motion processing in the human brain: The first stage is a number of independent local motion detectors (e.g., V1 or MT cells) that respond to motion in a small part of the visual field and that limit overall stimulus visibility through a contrast threshold. The second stage is an integrator mechanism over the local motion detectors (presumably a human homolog of MST) that is able to analyze more complex motion patterns across an extended region [see also Vaina (227) for a review on the physiology and psychophysics of complex motion perception].
Further similarities between the response properties of MST cells in the macaque cortex and an optic flow channel in the human visual system were demonstrated by Snowden and Milne (228). They showed that adapting to a spiral motion RDP (as described in ANATOMY OF THE MEDIAL SUPERIOR TEMPORAL AREA) elicited aftereffects that were selective for the adapting direction and position invariant, which agrees well with the properties of MST neurons described by Graziano and colleagues (39).
We have highlighted a few selected psychophysics experiments that specifically aimed at providing evidence for an MST-like processing channel in the human brain. To review the entire field of motion psychophysics would clearly exceed the scope of this review and we refer the reader to existing reviews covering this field in more detail (224,229,230).

Functional Imaging
Positron emission tomography (PET) and functional magnetic resonance imaging (fMRI) provide a more direct measure of motion perception in the human cortex [always keeping in mind that although BOLD activity is highly correlated with neural activity (231), the two are not identical]. A motion-sensitive area in the inferior temporal sulcus (ITS) of the human cortex, considered a homolog to macaque cortical areas specialized for motion processing, has been established quite early (232). This complex is often referred to as hMT/V5 ["human MT," Orban et al. (233)], MT þ or hMT þ ["human MT" with the " þ " suggesting additional areas being included; e.g., Beauchamp et al. (234)], or simply "MT-MST" [e.g., O'Craven et al. (235)]. All of these names acknowledge that this complex probably contains multiple areas and a number of different fMRI studies strive to differentiate these areas, each leveraging one specific difference between MT and MST.
Morrone et al. (236) made use of the fact that MST, but not MT neurons, respond selectively to circular and radial motion trajectories. They found a part of the V5/MT complex along the sulcus that separates Broadmann's area 19 (V3, V4, V5) and 37 (fusiform gyrus) that showed activation when contrasting responses to circular, radial, and spiral RDPs with responses to randomly moving noise RDPs. This area was distinct and on average more than 1 cm removed from another area in the V5/MT complex that was selectively activated by contrasting translational motion with noise RDPs.
Dukelow et al. (237) made use of the fact that MST, but not MT receptive fields, extend into the ipsilateral visual field and MST, but not MT neurons receive extraretinal smooth pursuit eye movement signals (as reviewed in VESTIBULAR TUNING AND MULTISENSORY INTEGRATION). They found that ipsilateral optic flow stimuli produced activation at the anterior end of the MT þ complex (putative MST) whereas the posterior part of the MT þ complex (putative MT) was only activated by contralaterally presented stimuli. Nonvisual smooth pursuit eye movements in darkness activated a small volume in the anterolateral section of the MT þ complex that responded only weakly to contralateral or ipsilateral motion. The authors thus conclude that the human MT þ complex can be divided in three areas: 2. It should be noted, in all fairness, that when the study by Regan and Beverley (223) that Burr and Thompson (224) refer to was published, MST had not been described yet, MT had not been explored in great detail yet, and even the concept of a dorsal pathway had not yet been proposed.
1) an anterior part that responds to contra-and ipsilateral optic flow and can be considered a homolog of macaque MSTd, 2) an anterolateral part, slightly inferior to the first, that is selectively activated during nonvisual pursuit which shares similarities with macaque MSTl, and 3) a posterior part that only responds to contralateral motion stimuli and can be considered a homolog of macaque MT. Peuskens et al. (238) investigated brain regions involved in heading estimation by presenting observers with ground plane optic flow patterns. Contrasting activity in a heading task with a control condition, they found that the heading task activated hMT/V5 as well as a more ventrally located region, which they called the "inferior satellite of hMT/V5." They suggest this region to be a likely candidate for the human homolog to macaque MSTd.
Huk et al. (239) made use of the fact that MT has a much more fine-grained retinotopic map than MST and that MST receptive fields are much larger than MT receptive fields and often extend into the ipsilateral visual field. They assessed retinotopy with a motion defined wedge that rotated through the visual field and compared ipsi-and contralateral responses to stimuli restricted to one hemifield. A subregion of the hMT þ complex on the posterior (or ventral) bank of the ITS showed response modulations to the rotating wedge consistent with a retinotopic organization (presumably MT) and a separate subregion on the anterior (or dorsal) bank of the ITS showed strong activation in response to ipsilateral, peripheral stimuli (presumably MST) (Fig. 15). This division of the hMT þ complex in two areas based on retinotopic organization, responses to ipsilateral stimuli, and responses to optic flow stimuli was also confirmed by Smith et al. (240).
To differentiate between brain areas that are selective for optic flow from those that respond to complex motion more generally, Wall and Smith (241) compared responses to a single, large patch of optic flow or an array of nine similar patches that did not indicate egomotion. They found that MST responded well to both types of stimuli, but significantly better to the egomotion-consistent, single optic flow stimulus. Similar findings in the macaque brain, using the same stimuli (242), provide further evidence for the similarities between human and macaque MST.
More recent approaches in functional imaging have moved beyond trying to isolate individual areas and instead focus on networks of areas. A retinotopic mapping in combination with stimuli designed to test motion and shape sensitivity identified 18 retinotopic occipital regions, including 4 regions that constitute the human MT/V5 complex (243). Comparison with similar fMRI studies in macaques (27) suggests that one of these four regions of the MT/V5 complex is a putative homolog of MSTv (but not MSTd) as they share similar topographic organization and topological neighborhood and in accordance with other studies, this putative MSTv is located anterior to MT/V5.
In conclusion, although the overall location of the hMT þ complex differs slightly from the location of the MT/MST complex in the macaque cortex, fMRI studies provide overwhelming evidence for a posterior and an anterior area within the hMT þ complex that show strong anatomical and functional similarities with macaque MT and MST, respectively.

Transcranial Magnetic Stimulation
Strong et al. (244) selectively disrupted neural activity in human MT or MST using transcranial magnetic stimulation (TMS) to test how these two areas differ in their contribution to the perception of different types of motion. Participants had to discriminate the translational (up vs. down), radial (expansion vs. contraction) or rotational (clockwise vs. counterclockwise) direction of a low coherent RDP presented in the periphery whereas MT or MST in the hemisphere contralateral to the stimulus were stimulated with five TMS pulses during stimulus duration. Stimulation of either area impaired performance for the translational direction discrimination, but only stimulation of MST impaired performance in the radial and the rotational direction discrimination tasks. The authors argue that this indicates a serial processing stream where information is passed from MT on to MST, which then integrates this information to represent more complex forms of motion. This reflects a broad consensus on how the larger receptive fields in MST are built from a mosaic of receptive fields, representing individual MT neuron input. On the other hand, studies like the one cited in the section Attention: Modulating Firing Rates (203) suggest that neuronal tuning aspects in MST could also be generated from V1, V2, and V3 inputs, bypassing MT. This is supported by the lack of impairments in perceptual performance for radial and rotational movements when MT is interrupted by TMS.

Functional Similarities between Human fMRI and Monkey Physiology in Studies of Higher Cognition
Additional studies suggest that some of the MST properties that have been described in the previous sections for the macaque apply to the human as well: Using fMRI, O'Craven et al. (235) showed increased activity in MT/MST when subjects attended to moving dots compared with when they attended to simultaneously presented stationary dots. This suggests that attention modulates MT/MST activity in humans in a similar manner as it does in macaques (200).
Thus, a human homolog of macaque cortical MST that is distinct from a homolog of MT has been established by making use of a number of ways in which the two areas differ, such as receptive field size and responses to linear versus spiral motion. A third example for a categorical difference between MT and MST is the activity during the memory period of a delayed match-to-sample task that was described in the previous section (213). It is an interesting open question whether this finding can be replicated in humans as well. Answering the question requires a high spatial resolution to distinguish between MT and MST, but does not rely on a millisecond temporal resolution, as the delay period can last for several seconds. Therefore, fMRI would be a suitable method to investigate this question.
In conclusion, there is overwhelming evidence that macaque cortical MST has a homolog in the human cortex. Thus, MST is not only suitable as a model area for studying a variety of general sensory and cognitive mechanisms, but also allows us to draw strong conclusions about neural mechanisms underlying human vision. Takeaway 1) Psychophysical experiments suggest that the human visual system contains specialized processing "channels" or "detectors" for rotational and radial motion patterns that share similarities with the processing in macaque MST neurons. 2) Functional MRI studies have identified areas within the human MT þ complex that share similarities with macaque MST, such as selective responses to optic flow and large receptive fields and therefore constitute a likely homolog. 3) Transcranial magnetic stimulation (TMS) applied to the putative human homolog of MST, but not to putative human MT, selectively impaired performance in a discrimination task with radial and rotational motion patterns, providing further evidence for a division of the human MT þ complex that is similar to macaque MT/ MST.

CONCLUSIONS
All of the information about the visual world that is available to an organism must be encoded in the neural responses that leave the two retinae. As David (245) pointed out, different representations of information will produce different aspects of reality. In that sense, one can think of the areas of visual cortex and their activity patterns as partial representations of the visual world, each emphasizing a facet. Information about edges, texture, color, and motion is all contained within the activity that reaches V1, but only after being processed by specialized brain areas with very specific connectivity patterns will this information be made explicit within the firing rates of individual neurons or small groups thereof. Correspondingly, much of modern cortical electrophysiology has been focused on identifying the distinguishing characteristics of the plethora of areas and their respective partial representations of the sensory environment.
As we have reviewed here, these approaches have identified MST as a key area for integrating multimodal information: information about motion, both linear and more complex, about heading, eye movements, or memorized motion. All are explicitly encoded in its population activity. We argue that MST is more than just another processing step along the visual hierarchy that represents information about the visual world in yet another way. Once information reaches MST, it has undergone enough changes to be "ready for use." Especially with regard to self-motion perception, the evidence suggests that MSTd (likely in cooperation with surrounding areas, such as VIP) represents the information in a way that can be used by decision and motor areas to react. In other words, there is likely no need for further reshaping of its representations of visual features by downstream areas.
One aspect that needs to be kept in mind is that MST is composed of subregions, which we have alluded to multiple times throughout this review. Even within a subregion, individual neurons often show a wide array of behaviors: they respond better to the motion of individual objects or to the wide-field motion of the background; they do or do not respond to vestibular cues; they have congruent or incongruent preferred directions for visual and vestibular cues; they encode an optic flow field's focus of expansion in retinal or in real-world coordinates. As Bradley et al. (119) pointed out, this could mean that only some MSTd neurons are actively involved in heading perception, whereas other MSTd neurons are contributing to other perceptual tasks. Alternatively, these differences in response patterns could be a sign that a lot of computation is happening within the area and that these different cell types represent different stages of these computations.
It is probably because these complex response patterns in MST defy an easy description of its function that this area has received less prominence as a model system for sensory, cognitive, and motor planning processes than MT Fn3 3 . Comparatively simple approaches for characterizing a visual cortical area, such as tuning curves, work well in early and midlevel areas, such as MT or V4. Higher level areas, such as IT and MST, require more complex methods, such as adaptive sampling of complex stimuli (246) or hierarchical convolutional neural networks (247). But now, the time might be ripe to embrace the challenge and appreciate the role of MST at the intersection of sensation and cognition.
The overarching goal should be to get a better understanding of MST's role in everyday behavior. This requires the combination of different methodologies, such as psychophysics, functional imaging, electrophysiology and disruptive methods like TMS and optogenetics. They should be applied across different experimental paradigms that aim at simulating more natural behavior, such as free-viewing (125) and combined visual and vestibular stimulation in the face of self-and object motion [e.g., Sasaki et al. (137)]. The combination of functional imaging, electrophysiological recordings, 3. A Google Scholar search for "middle temporal" lists 160,000 hits, compared to slightly more than 6,400 for "medial superior temporal" (as of Jan. 19, 2021). PRIMATE EXTRASTRIATE CORTICAL AREA MST microstimulation, and artificial neural networks emerged as a fruitful approach for discovering new principles that govern the organization of IT cortex (55) and could be useful in advancing our understanding of MST. Recent advances in wireless recording techniques in freely moving animals [e.g., Berger et al. (248)] in combination with improved behavioral tracking methods (249) might open many new opportunities in the near future. It is precisely because of its unique position as a gateway between perception and cognition that we believe MST to be an ideal model system of a core feature of higher nervous systems: the transformation of sensation into multidimensional internal representations of a dynamic environment, enabling cognition and sophisticated action planning.