3D/4D Ultrasound Research: New Methods, Challenges, and Insights
Kelly H. Berkson & Steven M. Lulich
In a typical introduction to articulatory phonetics, a basic two-dimensional representational scheme that relies heavily on the mid-sagittal diagram is used to describe the vocal tract shapes involved in speech sound production. These images deeply inform the way many of us think about articulation, and although they are useful they suffer from some obvious problems: in particular, the vocal tract is not 2D. Recent advances in technology have allowed for volumetric imaging, however—first with MRI (Story et al 1998, Dang & Honda 2002), which has also been used for imaging of multiple slices in real time (Masaki et al 1999, Narayanan et al 2004, Kim et al 2012), and now with ultrasound. This is exciting: ultrasound is less expensive than MRI, more appropriate for use with children and other populations for whom MRI imaging is either impractical or unsafe, and can be used to record large amounts of data from many speakers in reasonably short periods of time.
The latest 3D/4D ultrasound technology allows for volumetric images of the tongue to be recorded in real-time (Bressmann 2010, Rhodes et al 2015, Foley et al 2016, Lulich et al 2017, Berkson et al 2017) at frame rates between 10 fps (for large volumes) and nearly 100 fps (for small volumes), and with sub-millimeter resolution in all three spatial dimensions. There are some challenges: ultrasound remains limited by the fact that only structures proximal to the tongue surface can reliably be imaged (Stone 2005), and by the fact that a substantial commitment of time and resources is required to analyze the large amounts of data generated by ultrasound. Tongue surface contours must typically be segmented and traced, often by hand, before being subjected to analysis, and standards for analysis and quantification of three-dimensional tongue surfaces segmented from 3D/4D ultrasound do not yet exist.
This talk presents an overview of work currently underway in the Speech Production Laboratory at Indiana University, where we are using recent technological advancements to record 3D/4D ultrasound imaging of tongue morphology during speech sound production. These data are combined with other information (e.g. audio recordings, webcam video, and scanned 3D palate impressions), allowing us to image speech sound articulations as the three-dimensional phenomena they are. Over the last three years we have collected data on a variety of phenomena (laterals and rhotics, labial velars, palatals, tongue root advancement in vowels) in a variety of languages (Brazilian Portuguese, English, Igbo, Marathi, San Juan Quiahije Chatino, Wolof). Using data pulled from these studies, we describe the data acquisition and analysis methodologies under development in our labs and address some of the benefits and challenges of ultrasound research.