Speech Movement and Acoustic Analysis Tracking (SMAAT)

Speech Sound Disorders (SSDs) encompass a range of difficulties in speech planning and production that affects as many as 20% of preschool children. A SSD not only puts a child at serious and immediate educational disadvantages, activity limitations, frustration, and isolation but also has a huge impact on a child's quality of life. Timely and accurate diagnosis is therefore vital to receiving the right treatment, and reducing the impact of the disorder. However, due to several significant barriers, it is difficult to determine the presence or absence of a SSD and to differentially diagnose a SSD. These barriers are insufficient time and clinical resources, limited clinical expertise in remote areas, and inefficient and variable assessment procedures with compromised accuracy.

We have a SMAAT solution!

The Speech Movement and Acoustic Analysis Tracking (SMAAT) team is working to develop software for automatic speech movement and acoustic analysis. The software will use artificial intelligence and machine learning to support clinical diagnosis of a SSD.

Our SMAAT software innovation will be a portable non-invasive web or stand-alone application that will revolutionise the health services through improving the accuracy and (time) efficiency of differential diagnosis of SSDs in children. The availability of the SMAAT could allow allied health professionals to determine the urgency of care (triage) for patients and could help alleviate the strain on the healthcare system. This would reduce waiting time for access to a speech-language pathologist, introduce a telehealth option and reduce the need for face-to-face contact, particularly beneficial for rural and remote populations. This platform will increase the accuracy of diagnosis, reduce administration time, lower financial cost, and provide timely access to personalised intervention.

A multi-disciplinary and highly skilled project team comprising of national and international experts in the area of speech science, computer vision, artificial intelligence and clinical speech-language pathology is collaborating to create the SMAAT innovation.

Project Objective and Aims

The overall objective of this project is to use machine learning to create and evaluate an easy-to-use and relatively low cost software platform for the differential diagnosis of SSDs. The platform will utilise a customised word list to derive scores from acoustic, articulatory movement (jaw, lips, tongue) and clinical data inputs.

Our research aims are as follows:

  1. Development phase: Identify and derive distinct and repeatable articulatory, acoustic and phonetic features, of clinical interest, using audio-visual data
  2. Evaluation phase: Utilising computer vision and statistical learning approaches to create automatic scoring of features for clinical output, and
  3. External validation phase: Validate and trial a prototype for clinical implementation.

Speech-language pathologists (S-LP) play a primary role in the assessment, diagnosis and treatment of children with speech sound disorders. Currently, the assessment process for diagnosis and differential diagnosis of speech sound disorder subtypes requires time consuming administration of several tests (i.e. a test battery approach). This typically includes a case history, oral examination, speech sound assessment (accuracy of child's production of speech sounds) and speech motor testing (precision of speech movements of jaw, lips, tongue etc). These assessments are scored subjectively by listening to child's speech and/or by looking at the child's speech movements, using a pen and paper based approached.

Increasingly, speech science researchers are encouraging S-LPs to integrate objective measures into the assessment process. However, S-LPs face many barriers to achieving this in the clinical setting that include: cost, time, specialist expertise and suitability of the equipment for use with young children. Additionally, expertise to extract and interpret the data is required. These all present as large barriers to S-LPs obtaining objective measures and therefore restrict the capacity to provide an accurate and timely differential diagnosis.

Project 1: A preliminary evaluation of the performance of an automated video based facial tracking system with application in speech-language pathology
TEAM: Amar El-Sallam, Roslyn Ward, Paul Davey, Aravind Namasivayam, Geoff Strauss
The purpose of this study is to compare the accuracy of the automated video based tracking system developed by our team, with an optical motion capture system (Vicon Nexus). The Vicon Nexus motion capture system is a three-dimensional tracking system that is considered gold standard. It requires the placement of retro-reflective markers on the face.
In this study, the jaw and lip movements of a typically developing 14 year-old child, were captured using 3 standard off-the-shelf video cameras and eight 4Mp (Vicon) cameras, simultaneously. The child was required to produce 40 words (10 stimulus items per level of the Motor Speech Heirarchy (MSH) and 4 phrases, in response to the instruction "This is a X, say X". The speech movement features have been selected to represent the different speech subsystems and level of control represented in the MSH (Hayden & Square, 1994).
The data was collected in the Motion Analysis Laboratory at Curtin University by a certified practicing S-LP and biomechanist.
STATUS: Data analysis in progress.
Project 2: The development and validation of a probe word list to assess speech motor skills in children
TEAM: Aravind Namasivayam, Anna Huynh, Jennifer Hard, Rohan Bali, Vina Law, Francesca Granata, Darshani Rampersaud, Roslyn Ward, Rena Helms-Park, Pascal van Lieshout, Deborah Hayden
The Probe Word Scoring System (PWSS) is designed to measure change in motor speech skills in children either over time or following treatment. The PWSS was originally conceptualized and developed by Ms. Deborah Hayden from the PROMPT Institute in the early 2000s when she noticed the challenges of measuring changes in therapy. Further refinements were carried out in collaboration with the University of Toronto between 2012 and 2020, under the leadership of Dr. Aravind Namasivayam and Dr. Pascal van Lieshout.
They first fine-tuned the word list to address the following factors: (a) Speech Movement Complexity: How difficult are the speech movements used to produce certain words and how many different types of speech movements are needed to produce a word? (b) Language Complexity: How often do the words and sound combinations occur in the child's native language? (c) Word Familiarity: Are the words frequently used in the child's specific environment?
The PWSS was field-tested on 48 preschool and school-aged children with severe speech disorders at clinics across Ontario, Canada and refined once again. This rigorous process has led to the development of the current standardized, reliable, and valid PWSS. The PWSS supports speech therapists in identifying the area of speech motor breakdown in preschool and school-aged children, in setting appropriate goals for therapy, and allows them to measure changes in these therapy goals over time.
STATUS: Project completed; manuscript under review (May 2020)
Project 3: Development of the psychometric properties of the probe word list to assess speech motor skills in children
TEAM: Linda Orton (PhD Candidate), Neville Hennessey, Roslyn Ward, Aravind Namasivayam
The primary aim of this research project is to collect a performance sample of children aged 2 to 4 years to generate normative data and establish the psychometric properties for the Probe Word List and Scoring System (PWSS). As part of her doctoral project, Ms Orton will seek to undertake 3 studies:
Study 1 will involve the collection and analysis of perceptual, acoustic and kinematic measures on the PWSS in order to evaluate the construct validity of the PWSS, and to produce normative data.
Study 2 will further analyse the psychometric properties of the PWSS in accordance with the COSMIN checklist (COnsensus-based Standards for the selection of health status Measurement INstruments).
Study 3 will use a small pilot sample of children with known Speech Sound Disorder to determine the capacity of the PWSS to diagnose motor speech limitations in these children.
This research will provide valuable information about typical speech motor development and will progress the use of facial movements in the assessment and management of motor speech impairment in children with a Speech Sound Disorder.
STATUS: Ms. Orton achieved PhD candidacy (December 2019) and will begin data collection in September 2020
Project 4: Perceptual and acoustic features of typically developing 2 year olds during production of the speech word probes: A pilot study
TEAM: Roslyn Ward, Amar El-Sallam, Sonny Pham, Geoff Strauss, Linda Orton, Aravind Namasivayam, Katie Hustad, Neville Hennessey
The purpose of this study is twofold: 1. explore the acoustic sound (i.e., speech) characteristics of 10 typically developing 2 year-old children, during the production of the Probe Word and Scoring System (PWSS). The acoustic features have been selected to represent the different speech subsystems and level of control represented in the Motor Speech Hierarchy (Hayden & Square, 1994); and 2. compare the acoustic features of interest extracted manually with the proposed method for automated extraction.
data was collected in the Motion Analysis Laboratory at Curtin University by a certified practicing S-LP. Children produced 40 words (10 stimulus items per level of the MSH) in response to the picture stimulus presented on a large screen, followed by the instruction "This is a X, say X".
The speech acoustic data have been imported into an acoustic analysis package (PRAAT) and the word boundaries manually marked using a combination of spectrographic display, as well as listening to the child's speech. Words that are inaudible or cannot be coded due to background noise will be discarded. The acoustic features of interest include: duration; formant frequencies, ratios and slopes, frication rise time, voice onset time, spectral slope and moments, fricatives and stops.
The acoustic analysis using the PRAAT software package will be compared with the automated speech recognition system developed by our team.
STATUS: Data analysis in progress
Project 5: 3D facial landmarks tracking
TEAM: Nhan Dao, Kit Yan Chan, Richard Palmer, Roslyn Ward, Sunny Pham
This project aims to develop an algorithm to detect and track facial features from a sequence of 3D images. It represents a novel multi-disciplinary collaboration, focused on delivering software that can assist speech language pathologists in the identification of motor speech impairment in young children with speech sound disorders. The project investigates a novel idea to update the positions of an individualised 3D mesh of landmarks using only information from multiple 2D video cameras. This will enable accurate tracking of mouth/jaw movements in an inexpensive and non-invasive manner that does not entail the attachment of physical markers to the subject's face. The project involves several steps. Firstly, a 3D facial image of the subject is taken using specialised imaging hardware and the positions of known anatomical landmarks around the mouth and jawline are defined using the Cliniface 3D visualisation and analysis platform ( Secondly, 2D video from multiple cameras is recorded of the subject saying key phrases. The integration of these data involve key-point detection of the subject's face from both the initial 3D facial image and the sequence of 2D video frames. The key-points from both sources are brought into correspondence over the sequence of frames using K-Nearest Neighbour and Random Sample Consensus, with detection errors ameliorated with particle filtering. The 3D landmark positions, which are defined relative to the key-points, are iteratively updated by adducing the 3D repositioning that best explains the observed repositioning of the 2D key-points tracked in the sequence of video frames.
STATUS: Mr Dao currently enrolled in the Bachelor of Engineering with Honours
smaatSpeech Recorder App
To validate the automatic speech recognition system that will be developed in Project 4, speech movements and acoustic data from typically developing children and children with a SSD will be required. To facilitate acquisition of the data independently, an online speech recorder will be developed and trialled. The basis of the recorder will be a computer with a camera and microphone running the app. Features of the system proposed for development include:
  • Selectable recording of video and audio, or audio alone
  • Selectable camera or cameras
  • Selectable microphone
  • Presentation (sequential) of words and phrases from established word lists
  • Recorded video and audio, or audio alone, that can be utilised by the Online Probe Word List and Scoring Form
smaatSpeech Scorer App
The probe words developed by Hayden and validated by Namasivayam et al (see Project 2) will also be used in Projects 3 and 4. To facilitate their use, an online scoring form was developed. This form allows the user to review recorded video and/or audio and score the observed movements and phonetic complexity of the 40 words and 4 phrases. The screen display for each word enables S-LPs to tick boxes, enter phonetic text, and make explanatory or notable comments. This information is then used to generate the score for each word and phrase, and displays total scores and relative scores for each category of words and phrases. The S-LP can then select a priority for speech intervention. All aspects of the form are saved in a .json file. This file is created and enhanced as the S-LP progresses through the form and can be saved at any stage, and reloaded for review purposes. It can be imported into a spreadsheet for further examination.
Clinician Registration/Login Link
Participant Consent Form
Downloadable PDF
Video Calibration Grid
A calibration grid required in the smaatSpeech Recorder App
smaatSpeech User Manual
Dr Roslyn Ward
Senior Research Fellow, School of Occupational Therapy, Social Work and Speech Pathology, Faculty of Health Sciences, Curtin University

Geoff Strauss
Adjunct Teaching and Research Fellow, School of Physiotherapy and Exercise Science, Faculty of Health Sciences, Curtin University

Dr Amar El-Sallam
Adjunct Senior Research Fellow, School of Occupational Therapy, Social Work and Speech Pathology, Faculty of Health Sciences, Curtin University

Dr Petra Helmholz
Associate Professor, School of Earth and Planetary Sciences, Faculty of Science and Engineering, Curtin University

Dr Aravind K. Namasivayam
Oral Dynamics Laboratory, Department of Speech & Language Pathology, University of Toronto, Toronto Rehabilitation Institute

Dr Sonny Pham
Senior Lecturer, School of Electrical Engineering, Computing and Mathematical Sciences, Curtin University

Dr Neville Hennessey
Senior Lecturer, School of Occupational Therapy, Social Work and Speech Pathology, Faculty of Health Sciences, Curtin University

Dr Richard L. Palmer
Research Associate, School of Earth and Planetary Sciences, Faculty of Science and Engineering, Curtin University

Dr Katherine C. Hustad
Professor, Department of Communicative Disorders, University of Wisconsin-Madison

Dr Yuriko Kishida
Research Coordinator, Telethon Speech and Hearing
Adjunct Fellow, Macquarie University

Paul Davey
Senior Research Officer, School of Physiotherapy and Exercise Science, Faculty of Health Sciences, Curtin University

James Strauss
Web Developer

Dr Gareth Baynam
Clinical Professor, Faculty of Medicine and Health Sciences, University of Western Australia

Dr Catherine Elliott
Professor, School of Occupational Therapy, Social Work and Speech Pathology, Faculty of Health Sciences, Curtin University

Linda Orton
PhD Student, School of Occupational Therapy, Social Work and Speech Pathology, Faculty of Health Sciences, Curtin University


  • Palmer R, Helmholz P, Ward R, Strauss G, (2022) IntArchPhRS
  • Namasivayam AK, Huynh A, Granata F, Law V, van Lieshout P (2020) PROMPT intervention for children with severe speech motor delay: A randomized control trial. Pediatric Research, 89(3):613-621
  • Namasivayam AK, Huynh A, Bali R, Granata F, Law V, Rampersaud D, Hard J, Ward R, Helms-Park R, van Lieshout P, Hayden D. (2020) Development and validation of a Probe Word list to assess speech motor skills in children. American Journal of Speech-Language Pathology 30(2):622-648.
  • Hustad KC, Mahr TJ, Broman AT, Rathouz PJ. (2020) Longitudinal growth in single-word intelligibility among children With cerebral palsy from 24 to 96 months of age: Effects of speech-language profile group membership on outcomes. Journal of Speech, Language and Hearing Research, 63(1):32-48.
  • Palmer RL, Helmholz P, Baynam G. (2020) Cliniface: Phenotypic visualisation and analysis using non-rigid registration of 3d facial images. IntArchPhRS Proceedings of the ISPRS Congress (in press).
  • Helmholz P, Palmer RL, Baynam G. (2020) Scanning a new landscape - The new tool unlocking facial clues to rare diseases with spatial techniques. Position Magazine, February/March, 2020. pp. 22-24.
  • Namasivayam AK, Coleman D, O'Dwyer A, van Lieshout P (2020) Speech sound disorders in children: An articulatory phonology perspective. Frontiers in Psychology 10, Article 2998: 22 pages.
  • Hayden D, Namasivayam AK, Ward R, Eigen J, Clark A. (2019, Submitted on Invitation). The PROMPT approach: Theory, Evidence, Use and Application. In L. Williams, S. McLeod, & R. McCauley (Eds.), Interventions for Speech Sound Disorders. Second Edition, Baltimore, Maryland: Brookes Publishing.
  • Hustad KC, Sakash A, Natzke PEM, Broman AT, Rathouz PJ. (2019) Longitudinal growth in single-word intelligibility among children With cerebral palsy from 24 to 96 months of age: Predicting later outcomes from early speech production. Journal of Speech Language Hearing Research, 62(6):1599-1613.
  • Namasivayam AK, Bali R, Ward R, Tieu KD, Yan T, Hayden D, van Lieshout P (2018) Measuring and training speech-language pathologists' orofacial cueing: a pilot demonstration. Journal of Healthcare Engineering Article ID 4323046, 10 pages,
  • Bandini A, Namasivayam AK, Yunusova Y. (2017). Video-based tracking of jaw movements during speech: Preliminary results and future directions. In INTERSPEECH (pp. 689-693)
  • Namasivayam AK, Ward R, Bali R, Davey P, Strauss GR, Claessen M, Hayden D, van Lieshout P. Exploring quantifiable measures for the evaluation of SLP intervention fidelity. Poster presented at the 7th International Conference on Speech Motor Control, Groningen, The Netherlands.
  • Hayden D, Namasivayam AK, Ward R (2015) The assessment of fidelity in a motor speech-treatment approach. Speech Language and Hearing 18(1): 30-38.
  • Hustad KC, Oakes A, Allison K. (2015) Variability and diagnostic accuracy of speech intelligibility scores in children. Journal of Speech Language and Hearing Research. 17.
  • Lee J, Hustad KC, Weismer G. (2014) Predicting speech intelligibility with a multiple speech subsystems approach in children with cerebral palsy. Journal of Speech Language and Hearing Research. 57(5):1666-78.
  • Hayden D, Namasivayam A, Hard J, van Lieshout P (2014) Probe wordlist for the assessment of treatment progress and generalization in children with motor speech disorders.
  • Ward R, Leitao S, Strauss GR (2014) An evaluation of the effectiveness of PROMPT therapy in improving speech production accuracy in six children with cerebral palsy. International Journal of Speech-Language Pathology 16(4): 355-371.
  • Ward R, Strauss GR, Leitao S (2013) Kinematic changes in jaw and lip control of children with cerebral palsy following participation in a motor-speech (PROMPT) intervention. International Journal of Speech-Language Pathology 15(2): 136-155.


  • Helmholz P, Ward R, Palmer R, Baynam G, Strauss G, El-Sallam A, Namasivayam A, Pham S, Hennessey N, Hustad K, Kishida Y, Davey P, Elliott C, Orton L, Lichti D. Speech-Movement and Acoustic Analysis Tracker (SMAAT) Poster presented at the XXIVth ISPRS International Society for Photogrammetry and Remote Sensing Congress, 06 - 11 June 2022, Nice, France
  • Ward R, Palmer RL, Hennessey N, Orton L, Davey P, Strauss G, Helmholz P, Hayden D, Namasivayam A (2022) Spatiotemporal profiling of facial movements from video for motor-speech control assessment. In 8th International Conference on Motor Speech Control, Vol.: Groningen, The Netherlands, (submitted).
  • Ward R, Palmer R, Hennessey N, Orton L, Davey P, Strauss G, Helmholz P, Hayden D, Namasivayam A. The use of objective articulatory kinematic measures to support clinical decision making in the diagnosis of motor speech disorders: A pilot study. Poster presented at the 8th International Conference on Speech Motor Control, August 24th - 27th, 2022, Groningen, the Netherlands.