ICC   25427
INSTITUTO DE INVESTIGACION EN CIENCIAS DE LA COMPUTACION
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
Optimizing a Speaker Embedding Extractor Through Backend-Driven Regularization
Autor/es:
MITCHELL MCLAREN; LUCIANA FERRER
Reunión:
Congreso; Interspeech 2019; 2019
Institución organizadora:
ISCA
Resumen:
State-of-the-art speaker verification systems use deep neuralnetworks (DNN) to extract highly discriminant representationsof the samples, commonly called speaker embeddings. Thenetworks are trained to maximize the cross-entropy betweenthe estimated posteriors and the speaker labels.The pre-activations from one of the last layers in that network are used asembeddings. These sample-level vectors are then used as inputto a backend that generates the final scores. The most successfulbackend for speaker verification to date is the probabilisticlinear discriminant analysis (PLDA) backend. The full processconsists of a linear discriminant analysis (LDA) projection ofthe embeddings, followed by mean and length normalization,ending with PLDA for score computation. While this procedureworks very well compared to other approaches, it seems tobe inherently suboptimal since the embeddings extractor is notdirectly trained to optimize the performance of the embeddingswhen using the PLDA backend for scoring. In this work, wepropose one way to encourage the DNN to generate embeddingsthat are optimized for use in the PLDA backend, by addinga secondary objective designed to measure the performanceof a such backend within the network. We show modest butconsistent gains across several speaker recognition datasets.