Addressing sampling-frequency mismatch between speech data sets in a forensic voice comparison

Hanie Mehdinezhad, Bernard J. Guillemin, Balamurali B T

Wednesday, December 14th, 2022, Special Session 11am – 12.30pm

Abstract

Sampling-frequency (𝑓𝑠) mismatch between Suspect,Offender, and Background speech data sets in a Forensic Voice Comparison (FVC) are discussed and approaches to correct for this are presented. The Bayesian Likelihood-Ratio (LR) framework is used to express the results of a FVC and Gaussian Mixture Model-Universal Background Model (GMM-UBM) is used to calculate LR values. As appropriate, experiments have been conducted on both tokenized and stream data using Mel-Frequency Cepstral Coefficients (MFCCs) as the speech features. The results show that the best approach to correct for 𝑓𝑠-mismatch between speech data setsis down-sampling of the speech data set/sets at higher 𝑓𝑠 tomatch the speech data set/sets at lower 𝑓𝑠.