P22Session 2 (Friday 13 January 2023, 09:00-11:00)The interaction between energetic masking and attentional control during divided-attention listening
Understanding speech-perception-in-noise (SpiN) requires modelling the interaction between bottom-up (acoustic) factors and top-down (cognitive) processes. Much SpiN research focuses on situations in which listeners are required to track only one target talker presented in a background of to-be-ignored speech (i.e., selective attention). Less is known, however, about the acoustic/cognitive interaction in situations in which listeners must track two target talkers simultaneously (i.e., divided attention). In this case, the acoustic factors involved primarily concern energetic masking (EM) – spectrotemporal interference at the auditory periphery – while the cognitive processes involve control of auditory attention. In particular, auditory attention may need to be directed to different spatial locations during divided-attention listening. The impact of spatial separation between target talkers on the acoustic/cognitive interaction in this situation is unclear: separation is likely to both create acoustic benefits (release from EM) but also increase cognitive costs (by increasing demands on spatial attention control).
To explore this question, we ran two online studies (total N=320) using a “split listening” paradigm. Participants were asked to track the speech of two simultaneous target talkers. The relative intensity of the talkers was manipulated such that they were perceived as being in one of four spatial configurations: (1) collocated (i.e., diotic presentation); (2) spatially near, +/- 30º azimuth; (3) spatially far, +/- 60º azimuth; (4) spatially opposite, +/- 90º azimuth (i.e., dichotic presentation). As a result, EM was maximal in the collocated condition and effectively nil in the spatially opposite condition, whereas spatial-attentional demands were maximal in the spatially opposite condition and minimal in the collocated condition. We also manipulated the levels of EM between the target talkers: in Experiment 1, stimuli were constructed from natural (unmanipulated) speech, giving rise to high levels of EM; in Experiment 2, the two talkers’ speech was filtered into non-overlapping frequency bands, resulting in very low levels of EM.
When EM was high (Experiment 1), transcription performance improved monotonically from collocated to opposite, indicating a gradual reduction in EM via spatial separation. When EM was minimal (Experiment 2), the benefit of spatial separation disappeared, with transcription performance actually worsening in the opposite condition. Additionally, across both experiments, individual differences in working memory best predicted transcription performance in conditions where EM was low. These results suggest that acoustic processes are dominant during divided-attention listening but that the challenges of cognitive control and the contribution of working memory can be observed when EM is reduced.