How should we design a digital assistant? As natural as possible? As human as possible?
This is probably the question that I have discussed most frequently with colleagues and experts over the past few years.
Because even if it seems simple to answer for many at first glance, this question is extremely complex, as it affects numerous components of the design of a digital assistant.
In fact, it is also not so easy to cleanly separate the terms “natural” versus “human”, as they partly overlap.
Many studies suggest that people want the interactions with a digital assistant to be as natural as possible.
– Response time –
For example, they expect the time it takes the assistant to answer a question not to exceed the latency of a normal human response.
Why?
First, people usually write or speak with a digital assistant because they want to achieve a specific goal, as quickly as possible. Unnecessary waiting time is therefore viewed very negatively in studies.
At the same time, there is an interesting correlation in human communication that negative responses to invitations, requests or offers, for example, are more likely to be given with a delay. Conversation analysts talk of them as “dispreferred” (Bögels et al., 2015).
This correlation could also contribute to the negative evaluation of delays in conversations with assistants.
So, in summary, do we have to say here that people want an assistant’s response behavior to be as natural and human as possible?
Yes and no.
Yes, the latency should not be longer than with a response from a human or, if the system requires a longer processing time, it should be bridged in a meaningful way, for example by an explanation (“Just a moment, I need to check this briefly in my knowledge database.”).
At the same time, however, people evaluate it negatively when we artificially lengthen response times of a digital assistant to make them seem more human-like. This is particularly true for experienced users, as they know that a digital assistant can respond more quickly (Gnewuch et al., 2022).
This effect certainly has several causes. Firstly, people simply do not like to be deceived. In addition, there is also the so-called uncanny valley effect, which can have a negative impact on many components of an assistant.
– uncanny valley –
What is the uncanny valley? In 1970, Masahiro Mori, a robotics professor at the Tokyo Institute of Technology, wrote a paper about how he imagined people’s reactions to robots that looked and behaved almost like humans. Specifically, he hypothesized that a person’s reaction to a human-like robot would abruptly switch from empathy to disgust when it approached but did not achieve a lifelike appearance.
That is, as robots become more human-like in appearance and movement, people’s emotional reactions to them become more positive. This trend continues until a certain point is reached, at which point the emotional reactions quickly become negative.
This hypothesis has been studied by many scientists, especially in relation to the appearance and non-verbal behaviors of robots.
However, recent studies also shed light on highly interesting relationships affecting the design of digital assistants.
These concern, for example, the TTS of speech outputs, digital visual representation, and even the evaluation of machines’ decisions as moral or immoral.
– TTS of speech output –
Text-to-speech (TTS) technology synthesizes speech from text. Although TTS does not yet replicate the quality of recorded human speech, it has improved a great deal in recent years.
Neural TTS has recently gained attention as a method of creating realistic sounding synthetic voices in contrast with standard concatenative synthetic voices. It uses deep neural networks as a postfiltering step to create models that more accurately mimic human voices.
In a recent study, Do and colleagues (2022) conducted an experiment in which they analyzed how people evaluated standard TTS, Neural TTS and human speech. They found that the virtual human was perceived as significantly less trustworthy, if neural TTS compared to human speech was used, but there was no significant difference between Human speech and Standard TTS.
This means that it may be a better choice to use a normal “lower quality” TTS, which is less close to a human recording, than an optimized TTS, which is close to a human but still not quite and here may fall into the uncanny valley.
– Visual representation with an avatar –
Many would like to give their digital assistant a more human face or appearance, to make it more approachable. But also in this context, one should be aware of the uncanny valley effect.
Ciechanowski and colleagues (2019), for example, conducted an experiment to understand how people would evaluate interaction with a chatbot with an animated avatar and without a visual representation (simple text chatbot).
They found that participants were experiencing lesser uncanny effects (“weirdness” or discomfort) and less negative affect in cooperation with a simpler text chatbot than with the more complex, animated avatar chatbot. The simple chatbot also induced less intense psychophysiological reactions.
– Moral uncanny valley –
Artificial intelligence is playing a role in more and more areas of daily life. In the process, machines also increasingly have to make decisions that people evaluate morally.
At the same time, little is known about how the appearance of machines influences the moral evaluation of their decisions.
Laakasuo and colleagues (2021) have conducted exciting experiments on this, looking at the interplay of the uncanny valley effect and moral psychology.
They investigated whether people evaluate identical moral decisions made by robots differently depending on the robot’s appearance. Participants evaluated either deontological (“rule-based”) or utilitarian (“consequence-based”) moral decisions made by different robots.
Their results provide preliminary evidence that people evaluate moral decisions made by robots that resemble humans as less moral than the same moral decisions made by humans or non-human robots: a moral uncanny valley effect.
– To sum up –
Should we try to design digital assistants to be as human as possible? Quite clearly no.
If I’ve learned anything from the many studies I’ve conducted over the past few years, it’s that people interacting with digital assistants are primarily interested in achieving their goals quickly and in having a very transparent experience.
Transparency is, in sum, a very relevant topic for dialog design and could take up an entire article in itself. But what is important in this context is simply that the assistant should be transparent in the sense that it is not human.
The art now in design is to make the assistant not appear human, but still create an interaction that is as natural as possible for the user. After all, people still prefer natural speech in interaction, for example, compared to abbreviated “robot speech.”
Disclaimer: Please note that this article is not a conclusive scientific consideration of these issues. As in any other field of psychological research, there are always studies that reach different conclusions, also depending on the measured variables, their operationalization and their weighting for the interpretation of the results.
Bögels, S., Kendrick, K. H., & Levinson, S. C. (2015). Never say no… How the brain interprets the pregnant pause in conversation. PloS one, 10(12), e0145474.
Ciechanowski, L., Przegalinska, A., Magnuski, M., & Gloor, P. (2019). In the shades of the uncanny valley: An experimental study of human–chatbot interaction. Future Generation Computer Systems, 92, 539-548.
Do, T. D., McMahan, R. P., & Wisniewski, P. J. (2022). A New Uncanny Valley? The Effects of Speech Fidelity and Human Listener Gender on Social Perceptions of a Virtual-Human Speaker. In CHI Conference on Human Factors in Computing Systems (pp. 1-11).
Gnewuch, U., Morana, S., Adam, M. T., & Maedche, A. (2022). Opposing Effects of Response Time in Human–Chatbot Interaction. Business & Information Systems Engineering, 1-19.
Laakasuo, M., Palomäki, J., & Köbis, N. (2021). Moral Uncanny Valley: A robot’s appearance moderates how its decisions are judged. International Journal of Social Robotics, 13(7), 1679-1688.
Mori, M. (1970). “The uncanny valley,” Energy, vol. 7, no. 4, pp. 33–35.