NEWS

5407

Google AI can pick out a single speaker in a crowd:

16 April 2018 - 11:00 | Interesting information

Google researchers have developed a deep-learning audio-visual model that can isolate one speaker's voice in a cacophony of noise.

The 'cocktail party effect' -- the ability to mute all voices in a crowd and focus on a single person's voice -- comes easily to humans but not machines.

It's an obstacle to an application of the Google Glass smart glasses that I personally would like to see developed one day. That is, as a real-time speech-recognition and live-transcription system to support hearing-aid wearers.

Apparently voice separation is a hard nut to crack, but Google's AI researchers may have a part of the answer to my Glass dream in the form of a deep-learning audio-visual model that can isolate speech from a mixture of sounds.

The scenario they present are two speakers standing side-by-side jabbering simultaneously. The technique hasn't been proven in a real-world crowd but it does work on a video with two speakers on a single audio track.

itc.ua

Latest news

17 Jule 2025, 09:14 | Conferences, assemblies

16 Jule 2025, 09:33 | Important events

15 Jule 2025, 09:11 | New publications

14 Jule 2025, 14:47 | Important events

14 Jule 2025, 09:33 | Conferences, assemblies

11 Jule 2025, 17:32 | Important events