This study investigated the relationship between clearly produced and plain citation form speech styles and motion of visible articulators. Using state-of-the-art computer-vision and image processing techniques, we examined both front and side view videos of speakers’ faces while they recited six English words (keyed, kid, cod, cud, cooed, could) containing various vowels differing in visible articulatory features (e.g., lip spreading, lip rounding, jaw displacement), and extracted measurements corresponding to the lip and jaw movements. We compared these measurements in clear and plain speech produced by 18 native English speakers. Based on statistical analyses, we found significant effects of speech style as well as speaker gender and saliency of visual speech cues. Compared to plain speech, we found in clear speech longer duration, greater vertical lip stretch and jaw displacement across vowels, greater horizontal lip stretch for front unrounded vowels, and greater degree of lip rounding and protrusion for rounded vowels. Additionally, greater plain-to-clear speech modifications were found for male speakers than female speakers. These articulatory movement data demonstrate that speakers modify their speech productions in response to communicative needs in different speech contexts. These results also establish the feasibility of utilizing novel computerized facial detection techniques to measure articulatory movements.
Acoustic cues are short-lived and highly variable, which makes speech perception a difficult problem. However, most listeners solve this problem effortlessly. In the present experiment, we demonstrated that part of the solution lies in predicting upcoming speech sounds and that predictions are modulated by high-level expectations about the current sound. Participants heard isolated fricatives (e.g., “s,” “sh”) and predicted the upcoming vowel. Accuracy was above chance, which suggests that fine-grained detail in the signal can be used for prediction. A second group performed the same task but also saw a still face and a letter corresponding to the fricative. This group performed markedly better, which suggests that high-level knowledge modulates prediction by helping listeners form expectations about what the fricative should have sounded like. This suggests a form of data explanation operating in speech perception: Listeners account for variance due to their knowledge of the talker and current phoneme, and they use what is left over to make more accurate predictions about the next sound.
Training has been shown to improve American English speakers’ perception and production of the Spanish /ɾ, r, d/ contrast; however, it is unclear whether successfully trained contrasts are encoded in the lexicon. This study investigates whether learners of Spanish process the /ɾ, r, d/ contrast differently than native speakers and whether training affects processing. Using a cross-modal priming design, thirty-three Spanish learners were compared to ten native Spanish speakers. For native speakers, auditory primes with intervocalic taps (like [koɾo]) resulted in faster reaction times in response to matching visual targets (like coro) than to orthographically and phonemically similar targets (like corro and codo). American English speakers’ reaction times were not affected by the relationship between primes and targets before training. After training, trainees responded more quickly to matching targets than to mismatching /ɾ/-/r/ prime-targets (e.g., [koɾo] followed by corro) while controls’ reaction time patterns did not change. This indicates that native Spanish speakers and Spanish learners process words containing the /ɾ, r, d/ contrast differently and that improvements from training can be encoded in the lexicon.
The current study investigates preschool-age children’s comprehension of scrambled sentences in Japanese. While scrambling has been known to be challenging for children, biasing them to exhibit non-adult-like interpretations (e.g., Hayashibe in Descr Appl Linguist 8:1–18, 1975; Sano in Descr Appl Linguist 10:213–233, 1977; Suzuki in Jpn J Educ Psychol 25(3):56–61, 1977), children are able to interpret scrambled sentences in an adult-like way when the pragmatics is enriched in the experiments (Otsu in Acquisition studies in generative grammar, John Benjamins, Amsterdam, pp 253–264, 1994). These findings suggest that children’s difficulty in comprehending scrambling may be due to processing difficulties (Suzuki in J Psycholinguist Res 42(2), 119–137, 2013), such as the Lexical-ordering Strategy bias (Bever in Cognition and language development, Wiley, New York, pp 279–352, 1970), rather than their lack of the linguistic knowledge of scrambling. The current study revealed that children are indeed able to utilize prosodic information to interpret scrambled sentences in an adult-like way. Our findings provide converging evidence in favor of the proposal that children’s grammatical knowledge of scrambling is intact, although they are more vulnerable than adults to processing difficulties that hinder their ability to successfully interpret scrambled sentences.
- Journal of Phonetics 50, 15-33.
- Link to publication info
This paper examined the acoustic properties of the pitch accent of South Kyungsang Korean, focusing on generational differences. Kyungsang Korean has lexical pitch accents, whereas standard Seoul Korean does not. However, whether the pitch accents are maintained by younger Kyungsang speakers is questionable given the influence of Seoul Korean. Through comparisons between older and younger speakers and between Seoul and South Kyungsang speakers, this study tested if and how sound change occurs in the pitch accent system of the regional dialect, and if the prosody of Kyungsang Korean shifts towards that of non-tonal Seoul Korean. We examined F0 scaling and alignment of pitch accents for the data collected from 40 female Korean speakers (10 younger and 10 older speakers each for Seoul and South Kyungsang dialects). Clear acoustic differences between generations provided evidence for diachronic sound change in the lexical pitch accent of South Kyungsang Korean. First, the differences in F0 scaling and alignment across accent contrasts are less distinct for younger Kyungsang speakers than for older speakers. Second, the F0 peak occurs later for younger Kyungsang speakers across all accent classes, resulting in a final rising accent pattern in disyllables similar to Seoul Korean. Third, despite the similarity with Seoul Korean, results from longer words revealed that Kyungsang Korean is still distinct from Seoul in terms of its maintenance of the lexical pitch accent. Based on these findings, we conclude that the sound change in lexical pitch accent is in progress by satisfying the prosodic properties of both Seoul and South Kyungsang Korean.
- Applied Psycholinguistics. DOI
- Link to Publication info
The present study examines the relative impact of segments and intonation on accentedness, comprehensibility, and intelligibility, specifically investigating the separate contribution of segmental and intonational information to perceived foreign accent in Korean-accented English. Two English speakers and two Korean speakers recorded 40 English sentences. The sentences were manipulated by combining segments from one speaker with intonation (fundamental frequency contour and duration) from another speaker. Four versions of each sentence were created: one English control (English segments and English intonation), one Korean control (Korean segments and Korean intonation), and two Korean–English combinations (one with English segments and Korean intonation; the other with Korean segments and English intonation). Forty native English speakers transcribed the sentences for intelligibility and rated their comprehensibility and accentedness. The data show that segments had a significant effect on accentedness, comprehensibility, and intelligibility, but intonation only had an effect on intelligibility. Contrary to previous studies, the present study, separating segments from intonation, suggests that segmental information contributes substantially more to the perception of foreign accentedness than intonation. Native speakers seem to rely mainly on segments when determining foreign accentedness.
Whether morpheme-based processing extends to relatively unproductive derived words remains a matter of debate. Although whole-word storage and access has been proposed for some derived words, such as Japanese de-adjectival nominals with the unproductive (-mi) suffix (e.g., Hagiwara et al. in Language 75:739–763, 1999), Clahsen and Ikemoto (Ment Lex 7:147–182, 2012) found masked priming from de-adjectival nominals with productive (-sa) and unproductive (-mi) suffixes to their adjectivally-inflected base morpheme. Using masked and unmasked priming, we examine whether adjectivally-inflected base morpheme primes facilitate the processing of Japanese de-adjectival nominal targets with a productive or unproductive affix, including an orthographic-overlap condition and semantic relatedness measure that Clahsen and Ikemoto (2012) did not include. Our results replicate and extend Clahsen and Ikemoto (2012), revealing significant, statistically-equivalent morphological priming effects for -sa and -mi affixed targets, independent of orthographic and semantic relatednesss, suggesting that the processing of derived words with the unproductive -mi affix makes recourse to morpheme-level representations.