Study at CFL





 Geoff Morrison, IFL, Aston
Accounting for a six year time difference between questioned and known speaker recordings in a forensic voice comparison case

We describe a statistical modeling procedure that was used to account for the fact that in a particular case there was a six year time interval between when the questioned- and known-speaker recordings were made, but in the sample of the relevant population used for training and testing the forensic voice comparison system there were intervals of only hours to days between when each of multiple recordings of each speaker were made. If no adjustment had been made to take account of the difference in time interval between the questioned and known data versus within the training and test data, the result would have been biased and misleading. Although based on a particular case, the procedure has potential for wider application given that relatively large time intervals between the recording of questioned and known speakers are not uncommon in casework.

Material & methods:
Based on earlier research, it was observed that as the time interval between recordings within same-speaker test pairs increased the mean of the resulting same-speaker score values decreased toward the mean of the different-speaker scores generated by the same system. We used an i-vector PLDA system to generate scores for pairs of recordings from the Multisession Audio Research Project (MARP) database, a database that includes multiple recordings of each speaker made at intervals of approximately two months over approximately a three year time period, plus follow-up recordings made approximately seven years after the end of the original data collection. These scores included different-speaker scores and same-speaker scores, with the latter generated from pairs of recordings with a range of different time intervals between when the two members of each pair were recorded (speaker 1 had multiple same-speaker scores at different time intervals, so did speaker 2, etc.). We used these scores to train a statistical model that could predict the relative shift in same-speaker score mean compared to different-speaker score mean from one time interval to another. We calculated scores for the case relevant training and test data using the same i-vector PLDA system. We calculated the difference between the same-speaker score mean and different-speaker score mean for the case relevant data, and used this to convert the relative shift to an absolute shift applicable to the case-relevant same-speaker scores. The predicted absolute shift from one day to six years was calculated and applied to the case-relevant same-speaker training and test scores. These scores where then used to train and test a logistic-regression score to likelihood ratio conversion (calibration) model. The latter was then used to calculate a likelihood ratio for the score from the comparison of the questioned- and known-speaker recordings from the case. In the presentation, we will give details of the statistical modeling procedure and the results of tests of its effectiveness over a range of time intervals.
Published version
Morrison, G.S., Kelly, F. (2019). A statistical procedure to adjust for time-interval mismatch in forensic voice comparison. Speech Communication, 112, 15–21. https://doi.org/10.1016/j.specom.2019.07.001


Centre for Forensic Linguistics, Aston University, Birmingham, UK, 2018