When setting up a distance conferencing system, it normally includes one or more microphones and speakers in a room, connected to a POTS or VoIP telephone, or a Video Codec. The communication method doesn’t matter – a common problem occurs with all of them: echo.
The microphone(s) pick up the voice of the near end, and the speaker amplifies the far end. The trouble starts when the voice of the far end, played back through the speaker in the room, is captured by the microphone and transmitted back to the far end. At that moment, the far end hears their own voice as echo.
If you have ever experienced that, you will know that it is extremely difficult to maintain coherence of thought. Our brains are unable to function correctly when hearing our own voice with delay – and the more delay between the near end and far end (often called “round trip delay”), the harder it becomes to have productive conversations.
Half Duplex communication was one of the first solutions to the problem, and indeed it is still used in some scenarios. This simple technique effectively mutes the near end microphone when the far end speaks, thus eliminating a path for echo to occur. But implementing a half-duplex solution means that the conference is no longer a conversation, rather a series of monologues, as the far end won’t hear the near end until they finished talking. Interjections are not heard, and thus this isn’t a true conversation.
Acoustic echo cancellation (AEC) offers a solution to this, so let’s take a peek under the hood at the anatomy of AEC. Designing high-performance AEC processing requires a huge invest in engineering and in many cases details of the technology are proprietary to the company developing it. But it is generally agreed that most designs on the market rely on the following components:
The AEC processor needs to know what audio should be removed from the microphone. That’s accomplished by providing a “reference” signal; most often, the far-end voice. That way, the AEC circuit knows which signal is unwanted and, if the process works correctly, the far end will only hear the near-end conversation, and not their own echo.
The adaptive filter is where most of the AEC complexity lies. In simple terms, the AEC models an out-of-polarity signal intended to cancel out the far-end signal picked up by the microphone. Adaptive filters can be implemented in several ways, including time and/or frequency domain adaptation, different sets of calculation forms, and the choice of filter type being applied.
Non Linear Processing (NLP)
NLP is roughly equivalent to a sophisticated ducking technology used to suppress any residual echo that the adaptive filter did not cancel. While often required, it is worth mentioning that some AEC technologies can sometimes rely too much on the NLP, which results in over-suppression and/or distortion of the near-end speech for both primary (close to mic) and secondary (far from mic) talkers. Excessive use of NLP can destroy the “ambience” of the near end transmitted to the far end.
Some AEC circuits may include additional processing, such as Filters (typically a High-Pass), Noise Reduction or even Automatic Gain Control (AGC), all designed to improve audio quality. As you choose your solution, it is worth comparing what is offered from each solution.