To speak, that ability that characterizes human beings and enables communication, is no longer a distinguishing attribute of people: it has extended towards the world of computers. Nowadays, it is possible to tell a mobile phone to call somebody or to listen to the balance of a bank account reproduced by a device. However, researches are still trying to enhance the systems to recognize voice.
The first attempts to create machines capable of imitating the communication skill of humans emerged in the second half of the eighteenth century and aimed to produce successful interaction with them. Later on, scientists realized that to achieve speech understanding it was necessary first to recognize it.
Jorge Gurlekian, CONICET principal researcher at the Laboratorio de Investigaciones Sensoriales del Instituto de Inmunología, Genética y Metabolismo (INIGEM, CONICET-UBA) [Sensory Research Laboratory of the Institute of Immunology, Genetics, and Metabolism] and an interdisciplinary research team study the development of a voice recognition system.For the researcher, they have a lot of work to do because the implications of speech seem to be simple but actually represent a significant challenge for machines to understand them.
The automatic speech recognition (ASR) or the automatic voice recognition is a discipline of artificial intelligence whose objective is to enable spoken communication between humans and computers. “We usually use oral language without noticing the quantity and the complexity of processes involved in a conversation.Nonetheless, many of those processes pose considerable difficulties for computer systems”, the researcher explains. Besides, he adds that they have to overcome a great limitation because the speech does not only involve a “what” but also a “how”: silences, pauses and intonations are the key to effective communication.
For this reason, Gurlekian states that information is not only transmitted through words: the way in which a sentence is expressed, its intonation and other factors enrich the discourse but they tend to confuse the computer. “When we talk, the main challenge is to identify what is a voice and what is noise. For a machine, it is not easy to know in which sounds it should concentrate. When we know somebody, we adapt to the person’s pitch, tone and volume automatically, without asking him or her to speak for some time. Apart from that, it is very hard for a computer to distinguish similar phrases”.
Considering all those factors, researchers face a great challenge: to create a system that recognizes the speech of any human, taking into account that men are incredibly good at that.
The computer knows what I say
Gurlekian uses radio and television recordings to train an automatic system to learn words in the actual conditions they are said. This learning process takes place through the formation of acoustic models for each phoneme and according to the previous and subsequent phoneme. “The structure of language is represented by the most probable sequence of words in the discourse. This information, together with a dictionary of possible pronunciation for each word – for instance, the word ciudad (city), can be pronounced ciudad or ciudá – , are part of the language model”, the researcher explains. Furthermore, he adds that “the database that was created includes allophonic variations as well as dialectal prosodic variations produced in each region of the country, intonation, musicality, and rhythm”.
The voice recognition system uses a classification process for some patters that it stores in dictionaries. During a dictation, if the words used are not in its vocabulary, the software will look for other phonetically similar words available. This results in errors and highlights the need to train the programme to achieve more accurate recognition.
“These systems are based on the development of probabilistic models for each acoustic unit of language, statistical models for the words that the user will be able to use and the pronunciation models that indicate how the acoustic units are connected to form words. The performance of these recognition systems will depend on the quality of the recordings used to undertake the task, the kind of speech and the characteristics of each speaker. With professional broadcasters and special environment to record, the recognition percentages calculated in the laboratory exceed 97 per cent”, Gurlekian states.
Speech recognition technology has many possible applications, such as devices control, speech to text dictation and search within a sound archive. Furthermore, it can facilitate the communication between people with disabilities and promote the development of security measures based on voice, among many other possibilities.
- By Jimena Naser
- About the research:
- Jorge Gurlekian. Principal researcher. INIGEM.
- Evin Diego. Associate researcher. INIGEM.
- Humberto Torres. Assistant researcher. INIGEM.
- Cossio Christian. Fellow. INIGEM.
- Miguel Martínez. UBA.
- Pedro Univaso. UBA
Reunión de investigadores Eméritos para postular a la máxima Distinción Honorífica del CONICET
La distinción reconoce la labor de aquellos investigadores que acreditan destacados antecedentes académicos reunidos a lo largo de años de trabajo dentro y fuera del país.
Investigadora del CONICET fue distinguida por sus aportes en el campo de la audición
La Fondation Pour l’Audition reconoció a Ana Belén Elgoyhen por haber descubierto dos moléculas que protegen al sistema auditivo de entornos ruidosos.
Identifican un gen fundamental en la formación natural de semillas maternas en las plantas
El descubrimiento fue realizado por científicos del CONICET en el Instituto de Investigaciones en Ciencias Agrarias de Rosario y publicado en Frontiers in Plant Science
El Estado Nacional investiga en el Banco Burdwood Namuncurá con dos buques científicos
Se trata del Buque Austral del CONICET –tripulado por la Armada Argentina- y del BIP Víctor Angelescu del INIDEP –tripulado por la Prefectura Naval Argentina-.
Se entregó el Premio L´Oreal-UNESCO “Por las Mujeres en la Ciencia”
La ganadora fue la Dra. Silvia Goyanes, por su labor dedicada a generar filtros que contribuyan a mitigar la contaminación del agua.