Skip to main content

Improving multichannel speech enhancement through accurate room-acoustic simulations

*Georg Götz¹, Alessia Milo¹, Steinar Guðjónsson¹, Daniel Gert Nielsen¹, Jesper Pedersen¹, Finnur Pind¹

¹Treble Technologies, Reykjavík, Iceland

ABSTRACT

Room-acoustic simulations are widely used to augment training data for deep-learning-based speech enhancement. While most pipelines rely on simplified geometrical acoustics, wave-based approaches offer greater physical accuracy. In this work, we examine how simulation fidelity affects multichannel speech enhancement performance. To this end, we train SpatialNet on datasets augmented with different room-acoustic simulation methods and evaluate the resulting models on measured data. We compare lower-fidelity datasets based on geometrical acoustics with a high-fidelity dataset using advanced acoustic modelling and a hybrid combination of wave-based and geometrical acoustics simulations. Training on the high-fidelity dataset results in an up to 38% relative reduction in median word error rate compared to the lower-fidelity alternatives. These results show that augmentation with high-fidelity room-acoustic simulations directly translates into improved multichannel speech enhancement performance.

Download and read the full validation paper here