Ongoing Research Projects with Earlham Students

Aging Analysis on Veteran Dataset

An extension of our previous work to investigate whether paralinguistic vocal attributes improve estimates of the age and risk of mortality in older adults. We obtain interviews of male US veterans from the Veteran History Project database managed by the Library of Congress. We diarize the audio recordings, measure vocal features, and match mortality information. Veterans are randomly split into training and testing subsets to generate estimations of vocal age and years of life remaining. Computational analyses produced vocal age estimates that were correlated with both age and predicted time until death when age was held constant. We further extend this work by performing language analysis on the transcripts derived from audio recordings. To enrich our analysis and achieve a greater degree of accuracy, we obtained data from the National Death Index, which provides specific causes of death among the veterans, enabling us to establish a more direct correlation between the features extracted from both the audio recordings and their transcripts, and the prevalence of certain diseases. This study aims to uncover nuanced linguistic patterns and potential indicators of mental and physical health conditions that might not be immediately evident.

Read our previous publication in the Journal of Gerontology here.

Word Definitions from Large Language Models

Dictionary definitions are historically the arbitrator of what words mean, but this primacy has come under threat by recent progress in NLP, including word embeddings and generative models like ChatGPT. We present an exploratory study of the degree of alignment between word definitions from classical dictionaries and these newer computational artifacts. Specifically, we compare definitions from three published dictionaries to those generated from variants of ChatGPT. We show that (i) definitions from different traditional dictionaries exhibit more surface form similarity than do model-generated definitions, (ii) that the ChatGPT definitions are highly accurate, comparable to traditional dictionaries, and (iii) ChatGPT-based embedding definitions retain their accuracy even on low frequency words, much better than GloVE and FastText word embeddings.

Read the current state of our manuscript here.

Prosody Analysis of Audiobooks

Recent advances in text-to-speech have made it possible to generate natural-sounding audio from text. However, audiobook narrations involve dramatic vocalizations and intonations by the reader, with greater reliance on emotions, dialogues, and descriptions in the narrative. Using our dataset of 93 aligned book-audiobook pairs, we present improved models for prosody prediction properties (pitch, volume, and rate of speech) from narrative text using language modeling. Our predicted prosody attributes correlate much better with human audiobook readings than results from a state-of-the-art commercial TTS system: our predicted pitch shows a higher correlation with human reading for 22 out of the 24 books, while our predicted volume attribute proves more similar to human reading for 23 out of the 24 books. Finally, we present a human evaluation study to quantify the extent that people prefer prosody-enhanced audiobook readings over commercial text-to-speech systems.

Read the current state of our manuscript here.