3D audio and blind source separation

Acoustic blind source separation (BSS) is the problem of automated detection, tracking and separation of multiple audio sources from the observed mixtures. While humans have the ability to focus on a speaker of interest among several active speakers, for machines the problem is still being widely investigated.

This project researched and developed a novel approach to blind moving source separation (BMSS) in real-time using intensity vector direction (IVD) statistics. It included a mechanism of approximating room impulse response accurately, a method of detecting speaker activities swiftly, a model of tracking speaker movements reliably, and an algorithm of separating multiple speakers on demand from complex spatial audio mixtures effectively. A real-time demonstration had been developed with the proposed system pipeline, allowing users to listen to active speakers in any desired combination. The system had an advantage of using a small coincident microphone array to separate any number of moving sources utilising the first order Ambisonics signals while assuming source signals to be W-disjoint orthogonal. Being nearly closed-form, the proposed system did not require convergence or initialisation of parameters.

The developed system was very useful in speaker identification and speech recognition, as well as in speaker activity detection, speech enhancement, high quality hearing aids, safety and security, and teleconferencing.

The project was funded by MulSys Ltd UK and the EU FP7 research programme. It was completed successfully in December 2016.