NEWS

5373

Google AI can pick out a single speaker in a crowd:

16 April 2018 - 11:00 | Interesting information

Google researchers have developed a deep-learning audio-visual model that can isolate one speaker's voice in a cacophony of noise.

The 'cocktail party effect' -- the ability to mute all voices in a crowd and focus on a single person's voice -- comes easily to humans but not machines.

It's an obstacle to an application of the Google Glass smart glasses that I personally would like to see developed one day. That is, as a real-time speech-recognition and live-transcription system to support hearing-aid wearers.

Apparently voice separation is a hard nut to crack, but Google's AI researchers may have a part of the answer to my Glass dream in the form of a deep-learning audio-visual model that can isolate speech from a mixture of sounds.

The scenario they present are two speakers standing side-by-side jabbering simultaneously. The technique hasn't been proven in a real-world crowd but it does work on a video with two speakers on a single audio track.

itc.ua

Latest news

01 Jule 2025, 14:55 | Important events

01 Jule 2025, 14:37 | Meetings

01 Jule 2025, 11:42 | New publications

30 June 2025, 16:08 | Important events

30 June 2025, 09:10 | Meetings

27 June 2025, 09:11 | Conferences, assemblies