Blind source separation in real-time

As a world leading team in audio and speech signal processing, the IDT is proud of its long and outstanding contribution history to many national and international research, development and standardisation activities.

From the ETSI GSM Full and Half Rate speech and channel coding for mobile communication systems in its early days, to the ETSI AMR speech and channel coding systems used universally in the latest 3G/4G smart phones and advanced mobile networks, the IDT has always been at the forefront of fundamental and application research fields of audio and speech signal processing.

The IDT has developed a new real-time blind source separation (BSS) system that can be used in many consumer products and applications.

General description of blind source separation

Source separation aims to address the problem of extracting or separating one or more individual sources from a mixture of many such sound sources, whilst “blind” means the condition and the scene of the mixed sources are unknown to the separation system.

A very good example of the BSS problem is a cocktail party where many people are talking simultaneously with background noise and music, and one can selectively listen to what someone else is saying within the crowd. Human auditory system is used to this kind of scenarios and is very good at picking up and understanding a particular source from the mixed sound sources. However, it is rather difficult for machines to deal with this problem. Many different solutions and algorithms exist for machine-base blind source separation, varying from independent component analysis to time-frequency masking and computational auditory scene analysis.

How it works

The Institute for Digital Technologies blind source separation system uses an array of dynamic spatial filters to separate the sound sources with less computational complexity, which is especially suitable to real-time implementation on many different hardware and software platforms including smartphones and portable devices.

The mixed sound signals from various sound sources are captured in real-time using a commercially available tetrahedral microphone. The signals are further processed and converted into an overall pressure element W, and three pressure gradient measurements along three directions (X for the back-front, Y for the right-left and Z for the vertical, respectively). With the process of localisation, all energy components along a specified direction (intensity vectors) can be obtained. Therefore, the sound waveforms in a particular direction can be reconstructed using these components depending on the user requirement. In order to get the desired source signals whilst signals from all other directions are suppressed, dynamic spatial filters are deployed before the source signals are output as the separated sound sources.

Performance

The BSS system has been evaluated by subjective tests and measuring the signal to interference ratio (SIR) of the desired separated sound with respect to other undesired sound presents. All tests have shown that the separated sound signals achieve high SIRs with good subjective scores. The following clips are examples of the mixed and separated sounds obtained in the above test, where the original mixed signal is a mixture of four different sources of male speech, female speech, guitar and cello music.

The Institute for Digital Technologies' BSS is advantageous over other source separation solutions and algorithms in many aspects. Firstly, it uses a small-sized microphone array which can be further miniaturised into button size. Secondly, it is a real-time solution with much less computation complexity than other solutions and can be implemented on handheld devices such as smart phones. Furthermore, it performs better with high SIR improvement which gives a clearer separated result. Finally, the user can select a single source or any combination of sources to listen to from a mixture of sound sources, or the user can choose to suppress one or more sources.

Applications and impacts

The real-time blind source separation has many applications and real-world impacts:

  • Assisted independent living: to improve the quality of sound produced by a hearing-aid device
  • Mobile phone users and industry: to suppress and reduce the noise and interference on mobile devices potentially benefiting millions of users
  • Fast response security and surveillance: to point to sounds of interesting automatically and to listen to
  • Situation awareness in critical environments: to detect, identify and understand what is happening around in noisy and critical situations, e.g. battle fields