Dev Tools · 2h ago
Grouping Speakers with ECAPA-TDNN and ONNX Runtime
A developer tested ECAPA-TDNN speaker embeddings via ONNX Runtime to group utterances from a 14-second conversation. The model consistently produced 192-dimensional embeddings, and a sequential threshold algorithm successfully grouped utterances without knowing speaker count. ONNX Runtime processed audio faster than real time on CPU.
Meridian48 take
A practical demonstration of speaker diarization using open-source models, but the simple threshold approach may struggle with more speakers or noisy audio.
Read the full reporting
Grouping Utterances by Speaker with ECAPA-TDNN and ONNX Runtime →
DEV Community
speaker-embeddingonnx-runtime