By Pascale Fung
I recently wrote an article for Scientific American called ‘Robots with Heart’. In the piece, I described our work into incorporating an ’empathy module’ into robots in order for them to better serve the emotional and physical needs of humans. While many readers offered ideas on how we might apply these empathetic robots to medical or other applications, some objected to the very idea of making robots recognise and empathise with human emotions. One reader opined that, as emotions are what make humans human, we really should not build robots with that very human trait and take over the care-giving jobs that humans do so well. On the other hand, there are others who are so enthusiastic about this very idea that they ask me, “If robots are intelligent and can feel, will they one day have a conscience?”
Perhaps it is important for us to understand what it means by robot intelligence and feeling. It is important for us to understand, first of all, how and why humans feel.
What is the role of emotion in the evolution of our species? Research has shown that humans bond with other humans by establishing a rapport. The survival of a species depends on that bonding, and much of this bonding is enabled by emotion. We also signal our intent with emotion.
Our feelings and emotions are triggered by stimuli, either external or internal (such as a memory), and manifest themselves in terms of physical signs – pulse rate, perspiration, facial expressions, gesture, and tone of voice. We might cry or laugh, shudder in disgust, or shrink in defeat. Unlike our language, much of these emotions are expressed spontaneously and automatically, without any conscious control. We learn to recognize emotions in other human beings from birth. Babies are soothed by the gentle humming of a lullaby even before they are born. They respond to the smiling face of a parent at birth and are certainly capable of expressing their own emotions from day one.
Industry robots build our cars and our smartphones. Rehabilitation robots help people walk again. Machine teaching assistants can answer student questions. Software programs can write legal documents. They can even grade your essays. Software systems can write stories for newspapers. An Artificial Intelligence (AI) program just beat a human at Go, known to be the most complicated board game. IBM Watson beat human champions at Jeopardy. Machines can even paint to the point of fooling humans into believing the result to be that of a professional human artist. Machines can compose music. Robots can obviously be built to be stronger, faster, and smarter than humans in specific areas. But do they need to feel like we do?
In early 2016, our team announced the first known system that can recognize a dozen human emotions from tone of speech instantaneously and in real-time. Prior to this work, recognizing emotions from tone of voice would incur some delay in processing time due to a procedure called ‘feature engineering’, a delay that is unnatural in a human-robot communication scenario. To understand how we achieved this, we need to understand machine learning.
Every robot is run on a hardware platform driven by software algorithms. An algorithm is designed by humans to tell the machine how to respond to certain stimuli for example, or how to answer a question, or how to navigate around a room. Much like an architect building a house, an AI engineer looks at the whole picture of what the task the machine is supposed to achieve, and builds software ‘blocks’ to make it achieve that task. What is called programming is simply the implementation of the codes that realize these blocks. One of the most important blocks is machine learning – algorithms that enable machines to learn and simulate human-like responses, such as a chess move, or to answer a question.
What has fueled real breakthroughs in artificial intelligence has been machine learning. Instead of being programmed to respond in certain, predictable ways, machines are programmed to learn from large amounts of real-world examples of stimuli-responses. If a machine looks at tonnes of cat pictures labelled ‘cat’, it can use any one of the many machine learning algorithms to recognize a cat from any unseen picture. If a machine looks at trillions of websites and their translations, it can learn to approximate translation in the manner of Google Translate.
A critical part of machine learning is to learn the representation of the characteristics, called features, of the physical input. A cat is represented by its contour, edge, facial and body features. Speech input is represented by the frequency components of the audio. Emotions in speech are represented by not just the pitch, but the chroma, the tempo, the speed, of that voice. Machine learning needs to first perform feature engineering to extract these characteristics. For tone of voice, feature engineering typically extracts 1000-2500 characteristics from the input audio, and this process slows down the whole emotion recognition process. These thousands of features are carefully designed by humans and each of them requires processing time.
Recent breakthroughs in neural networks, aka deep learning, enabled by both machine speedup and massive amounts of data for learning, have led to vast improvements in machine learning. To start with, some deep learning methods, such as convolutional neural networks (CNN), can automatically learn the characteristics during the learning process, without an explicit and delayed feature engineering process or human design. This is perhaps the most important contribution of deep learning to the field of AI.
Coming back to our emotion recognition system from tone of voice, what we did is replace feature engineering and classifier learning by a simple convolutional neural net, which learns just as well, if not better, than classical machine learning approaches, and is much faster, because it does not require an explicit and slow feature engineering process. Similarly, facial expression recognition can be done in real-time with a CNN.
In addition, researchers are working to enable robots to express emotions – changing the pitch of its machine voice, using dozens to hundreds of tiny motors to control the synthetic facial muscles. The androids Sofia or Erica are two examples of humanoid robots with facial expressions.
Human-robot bonding and the Fourth Industrial Revolution
The Fourth Industrial Revolution is upon us. And it seems that technology is poised to replace humans in many areas. Skills that took years, maybe decades, to acquire seem to become obsolete overnight. Most of the population are not aware of the pace of progress made in AI and robotics prior to the current torrent of publicity, and are extrapolating what they see today to predict that robots will take over in 30 years, or 50 years. There has been a lot of anxiety in the society regarding the question of if and when will robots ‘take over’ from humans.
Truth is, this kind of prediction has been around for a long time. It happened during all previous industrial revolutions, when people feared steam engines or computers would render humans redundant. What has always happened is that people simply learned different skills to manage these machines, and more.
Nevertheless, with more applications of AI and robots, a new kind of relationship between human and machines needs to evolve. For humans to be less fearful of and to trust a walking, talking, gesturing and weight-carrying robot, we need to have mutual empathy with the robot. What sets a robot apart from mere electronic appliances is their advanced machine intelligence – and emotions. To understand the cry of a baby, or the painful groan in a patient’s voice, is critical to home care robots. For robots to be truly intelligent, they need to ‘have a heart’.
Still, will robots be conscious?
If a robot develops analytical skills, learning ability, communication, and even emotional intelligence, will it have a conscience? Will it be sentient? Can it dream?
The above-mentioned neural networks, unlike other machine learning algorithms, remind people more of our own brains. Neural networks can even generate random, dream-like images, leading some to believe that even robots can dream.
The real question is do we understand what makes us humans sentient? Is it just the combination of our sensory perception and the thinking process? Or is there more to it? AI researchers cannot answer this question, but we do believe that to make ‘good’ robots we have to teach them values – a set of decision-making rules that follow our ethical and moral norms. With the expansion in robot intelligence, teaching values to machines will become as important as teaching them to human children. Our next challenge would be to enable automatic machine learning of such values – once they have the prerequisite emotional recognition and communication skills.