28/07/2015 | Exact and Natural Sciences
Can computers talk?
CONICET researcher explains how speech processing systems can help to improve human communication
Agustín Gravano, assistant researcher at the CONICET. Photo: CONICET Photography.

Can machines think? This was what English mathematician Alan Turing, father of computer science, wondered in 1950 in his article “Computing machinery and intelligence”. In order to answer that, he designed a popular test that has his name and proves the ability of a machine, determined by an assessor, to behave as a human being.

In 1997, IBM, the technology company, wondered if a computer could beat the World Chess Champion, Gary Kaspárov. For this reason, he developed Deep Blue, a computer system that managed to defeat the Russian player in a six-game match. That landmark represented a point of arrival: a computer has beaten a human being in a task that seemed to be impossible to achieve.

Whether it is through science fiction or just reality, there are more and more tasks performed by machines. Agustín Gravano, CONICET assistant researcher at the Departamento de Computación de la Facultad de Ciencias Exactas y Naturales de la Universidad de Buenos Aires (UBA) [Computer Sciences Department of the Faculty of Exact and Natural Sciences of the University of Buenos Aires], studies, at a computer science level, the extraordinary and complex coordination of men to talk.

It is now more and more common to have different computer systems performing tasks humans can do such as playing chess, talking or recognizing objects. This machine’s capacity of imitating human reasoning is known as Artificial Intelligence (AI).

Gravano explains that one of the several themes that can be addressed from the IA is speech, specially the systems that enable oral communication between computer and men. “Within all the things humans can do, natural language ability is one of them: English, Spanish or Swahili, any language created naturally against those that were created for a specific purpose. They former ones, are spontaneously used, without awareness of their complexities because they have developed with ourselves and for that reason that is natural for us, but for a computer, it is complex. We have not been able to make computers handle language reasonably yet neither to perceive nor to produce acceptable messages”, he states.

The researcher adds that understanding words or recognizing intentions, emotions, double entendres, or subtleties are sophisticated tasks for a machine. In spite of some special speech recognition applications for cell phones, the percentage of error is very large. It is only tolerable at the moment in which people try to accelerate the dictation process of a message or an instruction, but not in the cases when people want to talk or if they look for quality of the perception of what has been said. These systems have a success rate that makes them commercially viable but they still have a long way to go.

Speech processing is in its first stages and it is still missing to understand what was said and how. That “how” is known as prosody, that is to say, a branch of linguistics that studies the phonic elements of an utterance such as accent, tones, and intonation. The study of prosody is conducted by the Grupo de Procesamiento del Habla del Departamento de Computación de la Facultad de Ciencias Exactas y Naturales [Speech Processing Group of the Computer Science Department of the Faculty of Exact and Natural Sciences] of the UBA, in which Gravano is a member.

“I joined a line of research that began to work in the nineties 90’s, we conduct statistic studies to describe the mechanisms and tools men have to change meanings subtly or drastically according to how something is said. We use techniques from machine learning, one of the big branches of the AI that creates programmes to learn behaviour based on examples”, he comments.

The research team works with recordings of conversations in a chamber acoustically prepared to avoid echoes and noises that damage the following analysis. Researchers study characteristics such as the volume or voice intensity of the audio signal because scientists use them to train algorithms so as to learn different combinations of them. The aim is to have a system capable of recognizing patterns that indicate different ways of talking, such as a detecting if a person says a word with a high pitch and great intensity, it means that the person wants to emphasise it.

This could be applied to all languages and cultures. From Computer sciences, we aim to find patterns that represent a common factor. It is not important where the person is from, his or her culture or socioeconomic level, because we all use more or less the same protocols to communicate. At this stage of development of the IA, our objective is to cover a wide range of the population with the least possible data”, the researcher states.

Finally, Gravano explains that this knowledge can be used by technology developers: the ones that are devoted to create automatic translators, the ones who produce speech recognition systems to do search in order to operate with a system, and the ones who aim to optimize customer service. As regards this, the scientist says that this could be a good analysis tool for call centers as the machine could infer which are the main complaints and the reasons, and that information can be analysed later by a person.

“It is important to consider how to combine people and systems because this will help to improve Communications. Language processing is still far from passing a Turing Test and that is why we have to work a lot. One can think that nowadays everything is solved with cell phones but in a dialogue system it is easy to realize that it is not a person who is talking”, he concludes.

These and other themes related to the AI will be studied by CONICET researchers and other organisms at the International Joint Conference on Artificial Intelligence.

For more information http://ijcai-15.org/

  • By Cecilia Leone.