Object recognition - Convergence of vision, audition, and touch.

Tanja Kassuba

Abstract

Recognizing objects is one of the most fundamental capabilities we use in every day interaction with our environment. As object information can be conveyed by different senses, real-life object perception is usually a multisensory experience. Thereby, we do not perceive an accumulation of features provided by different senses but rather have a unitary experience of the object we interact with. Thus, at higher object processing stages, object processing is generalized across the senses, that is, bound into unified object representations (Griffiths & Warren, 2004; Meyer & Damasio, 2009). Whereas several neuroimaging studies have identified brain regions such as the posterior superior temporal sulcus (pSTS) and the lateral occipital cortex (LO) to be implicated in audiovisual and visuo-haptic integration of object information, respectively (Amedi et al., 2005), little is known about the binding of object information across audition and touch or across all thee senses. Further, even though object recognition within different senses is to some degree redundant, the different senses differ with respect to their intrinsic efficiency in extracting types of information (Lederman & Klatzky, 2009). It is known that in multisensory binding, the modality providing the most reliable input dominates the resulting percept (Ernst & Banks, 2002). However, little attention has been directed towards putative intrinsic asymmetries between the senses in their contribution to multisensory binding (i.e., despite highly reliable input in each modality).
The general aim of this thesis is to contribute to our understanding of how the human brain unifies information from different senses into a coherent object representation. To this end, three studies have been conducted in healthy humans, implementing functional magnetic resonance imaging (fMRI), diffusion tensor imaging (DTI), and repetitive transcranial magnetic stimulation (rTMS).
The following research questions were addressed:
1. Where in the human brain does object recognition converge across vision, audition, and touch?
2. How is audio-haptic object recognition implemented in the human brain?
3. Are there intrinsic asymmetries in the contributions of vision and touch to visuo-haptic object recognition?
In all three studies, the same set of familiar, manipulable objects were used as stimuli as they are commonly captured by all three senses. Study 1 comprised two complementary fMRI experiments in order to identify brain regions where uni- and multisensory object-specific processing converge across all three senses or either bisensory pairings. Thereby, the first experiment contrasted processing of unimodal visual, auditory, and haptic objects relative to non-object control stimuli. The second experiment further investigated regional brain activations when participants matched object information presented simultaneously in two or all three modalities. Only a well-defined region in the left fusiform gyrus (FG) showed an object-specific activation during uni- and multisensory processing in all three senses. Other putative multisensory regions such as the left pSTS and LO were consistently activated by auditory/haptic and visual/haptic object processing, respectively. Together, the results suggest that the left FG binds information from all three senses whereas the left pSTS and left LO are convergence zones for audio-haptic and visuo-haptic information, respectively.
Study 2 used fMRI and DTI to further characterize how these regions contribute to the convergence of higher-order auditory and haptic object processing. During fMRI, a delayed-match-to-sample task was applied in which participants had to match a target object with a previously presented sample object within and across audition and touch in both directions (auditory─haptic and haptic─auditory). As a coherence in content is an important binding cue (Laurienti et al., 2004), semantic congruency between sample and target objects was manipulated. In line with the findings of Study 1, the left FG and pSTS displayed response patterns indicating audio-haptic binding mechanisms. While the left FG showed this pattern independently of the direction of crossmodal matching, the pSTS was selectively engaged in crossmodal matching of auditory targets. Together, the results suggest that the left pSTS might host auditory-dominated object representations that are crossmodally modulated whereas the left FG might represent supramodal conceptual object information. Probabilistic tractography was then used to describe probable anatomical connections between the FG, pSTS, LO, and modality-specific association cortices. The DTI data corroborated the fMRI findings by showing that auditory and haptic object pathways connecting these regions of interest likely converge in the left FG.
Last, Study 3 implemented fMRI and rTMS to investigate modality-specific effects on visuo-haptic convergence of object processing. To this end, an analogous delayed-match-to-sample task as in Study 2 was applied during fMRI, albeit with visual (instead of auditory) and haptic object stimuli. To further probe putative intrinsic asymmetries in the contribution of vision and touch to binding, the visuo-haptic convergence zone in the left LO (Study 1) was stimulated with 1 Hz rTMS before fMRI. Within participants, task-related brain activity was compared after effective versus noneffective (sham) rTMS. In contrast to auditory object recognition which depends on prior associations between object and sound, vision and touch share the extraction of an object's shape. Thereby, the visual system processes features constituting an object's shape at one glance rather than sequentially as touch does. Given this difference in effectiveness of the two senses, haptic
object recognition might gain more from interactions with vision than vice versa. Indeed, regions previously implied in visuo-haptic object recognition such as the bilateral LO, FG, and intraparietal sulcus (Lacey & Sathian, 2011) displayed multisensory interaction effects mainly when haptic targets were processed. However, crossmodal matching was not affected by rTMS of the left LO. Yet, rTMS induced a wide range of short-term plasticity effects in bilateral LO, FG, and pSTS that differed for visual and haptic processing. Therefore, the rTMS results underpin a functional asymmetry between vision and touch and confirm a functional relation of object processing in bilateral LO, FG, and pSTS.
In summary, the three studies have consistently pointed out that the left FG, pSTS, and LO are highly relevant for multisensory object recognition─not only for audio-visual and visuo-haptic integration but also for the binding of object information across audition and touch and across all three senses. This thesis concludes by implementing the results into a model of distributed semantic processing (Martin & Chao, 2001; Thompson-Schill, 2003). Whereas the left pSTS and LO might extract semantic attributes shared by different sensory pairings (e.g., action-related temporal dynamics and shape, respectively), the left FG might host supramodal association nodes coding the linkage of these attributes (cf. Mesulam, 1998; Meyer & Damasio, 2009). Using fMRI, DTI, and rTMS, this thesis provides valuable new insights on the neuronal implementation of multisensory object recognition, particularly on the binding of meaningful information across audition and touch or across all three senses─vastly understudied aspects of object recognition.
OriginalsprogEngelsk
ForlagGerman University Libraries
Antal sider140
StatusUdgivet - 2012

Citationsformater