Google AI can pick out a single speaker in a crowd: 16.04.2018    11:00 / Interesting information

Google researchers have developed a deep-learning audio-visual model that can isolate one speaker's voice in a cacophony of noise.

The 'cocktail party effect' -- the ability to mute all voices in a crowd and focus on a single person's voice -- comes easily to humans but not machines.

It's an obstacle to an application of the Google Glass smart glasses that I personally would like to see developed one day. That is, as a real-time speech-recognition and live-transcription system to support hearing-aid wearers.

Apparently voice separation is a hard nut to crack, but Google's AI researchers may have a part of the answer to my Glass dream in the form of a deep-learning audio-visual model that can isolate speech from a mixture of sounds.

The scenario they present are two speakers standing side-by-side jabbering simultaneously. The technique hasn't been proven in a real-world crowd but it does work on a video with two speakers on a single audio track.

itc.ua 

17 May – World Telecommunication and Information Society Day
Services
Inquire your business presence with us!
Read more
Trust our creativity and unique ideas!
Read more
Quality and colorful publish for affordable prices!
Read more
Improve IT skills and change your career!
Read more
Rich e-Library services!
Read more
SCIENTIFIC JOURNALS