Ongoing Research Projects with Earlham Students

Aging Analysis on Veteran Dataset

An extension of our previous work to investigate whether paralinguistic vocal attributes improve estimates of the age and risk of mortality in older adults. We obtain interviews of male US veterans from the Veteran History Project database managed by the Library of Congress. We diarize the audio recordings, measure vocal features, and match mortality information. Veterans are randomly split into training and testing subsets to generate estimations of vocal age and years of life remaining. Computational analyses produced vocal age estimates that were correlated with both age and predicted time until death when age was held constant. We further extend this work by performing language analysis on the transcripts derived from audio recordings. To enrich our analysis and achieve a greater degree of accuracy, we obtained data from the National Death Index, which provides specific causes of death among the veterans, enabling us to establish a more direct correlation between the features extracted from both the audio recordings and their transcripts, and the prevalence of certain diseases. This study aims to uncover nuanced linguistic patterns and potential indicators of mental and physical health conditions that might not be immediately evident.

Read our previous publication in the Journal of Gerontology here.

LLM Event Forecasting

Statisticians and psychologists have studied how humans make decisions, and what practices lead to better judgment. Large language models (LLMs) have broad knowledge from their extensive training on large corpus, and thus are expected to make better than random guessing predictions on a broad set of questions, maybe even better than human forecasters. This forecasting project studies how large language models(LLMs) can make predictions on real-world events and evaluate their accuracy on a self-curated dataset containing problems from forecasting sites like Good Judgment Open.

Evaluating Bias in Facial Expression Recognition Systems

Large-scale image datasets for Facial Expression Recognition (FER) are often built using web-scraping and crowdsourced annotations. While these methods enable researchers to gather millions of images in-the-wild quickly and cost-effectively, they also introduce bias. In this project, we investigate the presence of bias in FER systems by evaluating model performance across various demographic groups.

(Completed project under submission) Word Definitions from Large Language Models

Dictionary definitions are historically the arbitrator of what words mean, but this primacy has come under threat by recent progress in NLP, including word embeddings and generative models like ChatGPT. We present an exploratory study of the degree of alignment between word definitions from classical dictionaries and these newer computational artifacts. Specifically, we compare definitions from three published dictionaries to those generated from variants of ChatGPT. We show that (i) definitions from different traditional dictionaries exhibit more surface form similarity than do model-generated definitions, (ii) that the ChatGPT definitions are highly accurate, comparable to traditional dictionaries, and (iii) ChatGPT-based embedding definitions retain their accuracy even on low frequency words, much better than GloVE and FastText word embeddings.

Read the current state of our manuscript here.

(Completed project under submission) Prosody Analysis of Audiobooks

Recent advances in text-to-speech have made it possible to generate natural-sounding audio from text. However, audiobook narrations involve dramatic vocalizations and intonations by the reader, with greater reliance on emotions, dialogues, and descriptions in the narrative. Using our dataset of 93 aligned book-audiobook pairs, we present improved models for prosody prediction properties (pitch, volume, and rate of speech) from narrative text using language modeling. Our predicted prosody attributes correlate much better with human audiobook readings than results from a state-of-the-art commercial TTS system: our predicted pitch shows a higher correlation with human reading for 22 out of the 24 books, while our predicted volume attribute proves more similar to human reading for 23 out of the 24 books. Finally, we present a human evaluation study to quantify the extent that people prefer prosody-enhanced audiobook readings over commercial text-to-speech systems.

Read the current state of our manuscript here.