The Psychology of Conversational AI

Many articles and books discuss the danger of artificial intelligence. They postulate that an artificial intelligence comparable to human intelligence will be achieved in the foreseeable future. They go so far as to claim that this AI will then create ever more intelligent versions of itself, until we will be confronted with a “superintelligence”.

Some argue that we need to protect ourselves from and prepare for this inevitable development.

And of course, dealing with AI, like dealing with any data set, is associated with risks if one does not consider some essential points (more on this later). Nevertheless, it is important for me to emphasize at this point that the hypotheses described above have no scientific basis.

In his book “The myth of artificial intelligence – why computers can’t think the way we do”, Larson (2021) summarizes the current situation well:

“The myth of artificial intelligence is that its arrival is inevitable, and only a matter of time – that we have already embarked on the path that will lead to human-level AI, and then superintelligence. We have not. The path exists only in our imaginations. Yet the inevitability of AI is so ingrained in popular discussion – promoted by media pundits, thought leaders like Elon Musk, and even many AI scientists (though certainly not all) – that arguing against it is often taken as a form of Luddism, or at the very least a shortsighted view of the future of technology and a dangerous failure to prepare for a world of intelligent machines.” (…)

“All evidence suggests that human and machine intelligence are radically different. [However,] the myth of AI insists that the differences are only temporary, and that more powerful systems will eventually erase them” (Larson, 2021, p. 1).

As a psychologist who has intensively studied the complex functioning of the human brain in her training and who understands the technical possibilities from her work together with technical experts at least in its basic features, I have always wondered about statements comparing artificial to human intelligence.

Even today, we have only a rudimentary understanding of our intelligence, as well as the overall functioning of the human brain. There is not necessarily agreement among scientists about many theories on human intelligence. There is still a lot we are just beginning to understand and many things we thought we already understood, were disproved again.

As Eysenck stated in 1988 (p. 1):

“There has perhaps been more controversy concerning the nature and existence of intelligence than of any other psychological concept.”

This statement has lost nothing of its true essence to this day.

We are not even close to completely understand the functioning of the human brain, nor are we able to artificially reproduce it.

– Artificial intelligence and human intelligence –

One problem in the current debate on artificial intelligence is that the ability to “solve problems” is compared with human intelligence in total, and even equated with it in part.

To believe that all of human thought could be understood, in effect, as the “breaking” of “codes” – the solving of puzzles – and the playing of games like chess or go – is a very simplified view of intelligence.

Even though problem solving is certainly an important part of human intelligence, this approach is abridged and does not do justice to the complex biochemical processes that make up the functioning of our brain.

Indeed, analogical problem-solving performance correlates highly with IQ. This correlation led Lovett and Forbus (2017, p. 60) to argue that “Analogy is perhaps the cornerstone of human intelligence”. More precisely, there are close links between analogical problem solving and fluid intelligence, which “refers to the ability to reason through and solve novel problems” (Shipstead et al., 2016, p. 771).

However, Schlinger (2003) for example argues in his paper “The myth of intelligence” that there is no general intelligence underlying all skills and that a concept of intelligence as anything more than a label for various behaviors in their contexts is a myth and that a truly scientific understanding of the behaviors said to reflect intelligence can come only from a functional analysis of those behaviors in the contexts in which they are observed. Schlinger (2003) argues that the conceptualization of general intelligence was based on logical errors of reification and circular reasoning.

In line with this, some psychologists have gone beyond the concept of a unitary general intelligence and have suggested many different types of intelligence, some operating autonomously, from three (Sternberg, 1984) to seven (Gardner, 1983).

Sternberg’s (1984) triarchic theory includes three types of intelligence – analytical, creative, and practical. Gardner (1983) has postulated no fewer than six intelligences, including linguistic and musical intelligence, both of which are aural-auditory; logical-mathematical and spatial intelligence, which are visual; and bodily kinesthetic and personal intelligence.

Daniel Goleman popularized the phrase “Emotional Intelligence” with the publication of his book by the same title in 1995.

Wigglesworth (2004) introduced four different types of intelligences as physical, cognitive, emotional, and spiritual intelligences.

A report on intelligence issued by the Task Force established by the American Psychological Association concluded: “Because there are many ways to be intelligent, there are also many conceptualizations of intelligence” (Neisser et aI., 1996, p. 95).

These examples of theories of human intelligence are not intended to be, nor can they be, a comprehensive or even conclusive consideration of this topic at this point. Nevertheless, they illustrate the complexity of this topic and that the biochemical processes that constitute our thinking go far beyond rational problem-solving processes. In fact, one must even ask whether human thinking always proceeds rationally.

– Are humans even rational? –

This question is too complex to answer here, but the following explanation from Eysenck and Keane (2020, pp. 702 – 703) illustrates an important facet of the problem:

“Historically, an important approach (championed by Piaget, Wason and many others) claimed rational thought is governed by logic. It follows that deductive reasoning (which many have thought requires logical thinking) is very relevant for assessing human rationality.

Sadly, most people perform poorly on complex deductive-reasoning tasks. Thus, humans are irrational if we define rationality as logical reasoning.

The above approach exemplifies normativism. Normativism “is the idea that human thinking reflects a normative system against which it should be measured and judged” (Elqayam & Evans, 2011, p. 233). For example, human thinking is “correct” only if it conforms to classical logic.

Logic or deductive reasoning does not provide a suitable normative system for evaluating human thinking. Why is that? As Sternberg (2011, p. 270) pointed out, “Few problems of consequence in our lives had a deductive or even any meaningful kind of ‘correct’ solution. Try to think of three, or even one!””

– How does AI work and what is the difference to human intelligence? –

Game playing has been a source of constant inspiration for the development of advanced AI techniques. However, games are simplifications of life that reward simplified views of intelligence (Larson, 2021). A chess program plays chess, but does rather poorly driving a car.

Treating intelligence as problem solving gives us narrow applications. If machines would learn to become general, we would witness a transition from specific applications to general thinking beings – we would have AI (Larson, 2021).

But, as Larson (2021, pp. 28 – 29) explains further: “What we now know, however, argues strongly against the learning approach suggested early on by Turing. To accomplish their goals, what are now called machine learning systems must each learn something specific. Researchers call this giving the machine a “bias”. (…) A bias in machine learning means that the system is designed and tuned to learn something. But this is, of course, just the problem of producing narrow problem-solving applications. (This is why, for example, the deep learning systems used by Facebook to recognize human faces haven’t also learned to calculate your taxes.)

Even worse, researchers have realized that giving a machine learning system a bias to learn a particular application or task means it will do more poorly on other tasks. There is an inverse correlation between a machine’s success in learning some one thing, and its success in learning some other thing. (…)

But bias is actually necessary in machine learning – it’s part of learning itself.

A well-known theorem called the “no free lunch” theorem proves exactly what we anecdotally witness when designing and building learning systems. The theorem states that any bias-free learning system will perform no better than chance when applied to arbitrary problems.”

“Success and narrowness are two sides of the same coin. This fact alone casts serious doubt on any expectation of a smooth progression from today’s AI to tomorrow’s human-level AI. People who assume that extensions of modern machine learning methods like deep learning will somehow “train up”, or learn to be intelligent like humans, do not understand the fundamental limitations that are already known” (Larson, 2021, p. 30).

These explanations by Larson are, of course, only a small part, albeit an essential part, of the complex question of whether AI can be compared to human intelligence. For a more detailed consideration, I recommend Larson’s book.

Nevertheless, these statements in connection with the state of research on human intelligence show how unrealistic horror scenarios including a superintelligence are and that we are still very far away from understanding all the processes and interrelationships of human intelligence. Not to mention being able to reproduce them.

Quite independently of the question of whether we are threatened by a superintelligence, a careless handling of AI is nevertheless associated with risks.

– Actual risk of AI –

For anyone who has ever worked in research, this problem is well known: If we collect data sloppily, based on the wrong sample, or using the wrong methodological approaches, then these errors naturally carry over into the evaluation, analysis, and interpretation of the data, and the study is worthless – and in the worst cases, misleading and dangerous.

The same applies to AI, of course. Only high-quality data, without bias or other sources of error, can lead to valuable AI solutions. And only if we set up the AI models in such a way that we can explain them and they remain transparent for us, we know that the data is really used in our sense.

Real danger comes from non-transparent and non-explainable AI that is based on biased data sets. For example, racism and sexism, which have been consciously or unconsciously transferred from human thinking to the data basis, continue in the decisions and recommendations of AI.

AI must be transparent and explainable. Companies must be clear about who trains their AI systems, what data was used in training and, most importantly, what went into their algorithms’ recommendations.

Years of collecting biased data means that AI will be biased. Ethical AI starts with a thorough examination of the datasets used.

A recent example of what happens when we don’t do this is described in this article: Medical Robots conform to racism and sexism due to biased AI, proves experiment (stealthoptional.com)

In the study (Hundt et al., 2022), evidence shows that medical robots will show bias for operational procedures. Much like surgeons deciding who to operate on for success rates, a robot will pick between people simply after looking at their faces.

The article quotes from the study: “A robot operating with a popular Internet-based artificial intelligence system consistently gravitates to men over women, white people over people of color, and jumps to conclusions about peoples’ jobs after a glance at their face.”

“The robot has learned toxic stereotypes through flawed neural network models,” Hundt explained in the article. “We’re at risk of creating a generation of racist and sexist robots but people and organizations have decided it’s OK to create these products without addressing the issues.”

One of the reasons AI bias is making its way into more and more medical robots is the data sets companies are willing to use. Just like any industry, many companies are looking for the cheapest R&D possible.

So, in conclusion, we can say that AI itself is not a threat. We are not on the verge of superintelligence.

The danger of AI lies in the wrong way of dealing with it. Now it is up to us to demand and implement a transparent and ethical approach to data and AI.

Elqayam, S. & Evans, J. St. B. T. (2011). Subtracting “ought” from “is”: Descriptivism versus normatism in the study of human thinking. Behavioral and Brain Sciences, 34, 233–248.

Eysenck, H. J. (1988). The concept of “intelligence”: Useful or useless?. Intelligence, 12(1), 1-16.

Eysenck, M. W., & Keane, M. T. (2020). Cognitive psychology: A student’s handbook. Psychology press.

Gardner, H. (1983). Frames of mind: The theory of multiple intelligences. New York: Basic Books.

Goleman, D. (1996). Emotional intelligence: Why it can matter more than IQ. Bloomsbury Publishing.

Hundt, A., Agnew, W., Zeng, V., Kacianka, S., & Gombolay, M. (2022). Robots Enact Malignant Stereotypes. In 2022 ACM Conference on Fairness, Accountability, and Transparency (pp. 743-756).

Larson, E. J. (2021). The Myth of Artificial Intelligence. In The Myth of Artificial Intelligence. Harvard University Press.

Lovett, A. & Forbus, K. (2017). Modelling visual problem solving as analogical reasoning. Psychological Bulletin, 124, 60–90.

Neisser, U., Boodoo, G., Bouchard Jr, T. J., Boykin, A. W., Brody, N., Ceci, S. J., … & Urbina, S. (1996). Intelligence: knowns and unknowns. American psychologist, 51(2), 77.

Schlinger, H. D. (2003). The myth of intelligence. Psychological Record, 53(1), 15-32.

Shipstead, Z., Harrison, T.L. & Engle, R.W. (2016). Working memory capacity and fluid intelligence: Maintenance and disengagement. Perspectives on Psychological Science, 11, 771–799.

Sternberg, R. J. (1984). Toward a triarchic theory of human intelligence. Behavioral and Brain Sciences, 7(2), 269-287.

Sternberg, R.J. (2011). Understanding reasoning: Let’s describe what we really think about. Behavioral and Brain Sciences, 34, 269–270.

Wigglesworth, C. (2004). Spiritual intelligence and why it matters. Kosmos Journal, spring/summer.

5. July 2022

Allgemein

AI
Should we define a character for a digital instance?

Or are we thereby promoting the uncanny valley effect?

Or, in the worst case, does this encourage deception of users, who might then interpret a consciousness into conversational AI?

Only recently, a Google employee again clearly demonstrated to us that people tend to interpret human characteristics into artificial instances – #lamda (Google engineer claims LaMDA AI is sentient | Live Science).

In fact, many find the idea of defining a character for, say, a simple chatbot designed to assist with mundane tasks rather strange. And yes, sometimes character definitions go very far and create a complete backstory for the assistant, including where it grew up and how much it earns… One can indeed find this strange.

However, from a design perspective, it is crucial to think about the character of the assistant.

When we communicate with a digital assistant, whether we want to or not, we interpret a character into what is said or written. This attribution of a human personality or human characteristics to something non-human, such as an animal or an object, is called anthropomorphism.

Anthropomorphism is the ability of humans to attribute human motivations, beliefs, and feelings to nonhuman beings. Researchers have found anthropomorphism to be a normal phenomenon in human-computer interaction (Reeves & Nass, 1996; Cohen et al., 2004; Lee, 2010).

According to the “Computers Are Social Actors” (CASA) research paradigm, despite their knowledge that computers do not warrant social treatment, people nevertheless tend to have the same social expectations of computers and exhibit the same responses to computers as they do to human interaction partners (Lee, 2010).

This means that not thinking about the character of the assistant does not mean that the assistant is then simply perceived neutrally, i.e., that it then has no character.

It is just not necessarily the character that you might have had in mind for it.

In addition, we need a character definition in order to be able to formulate all dialogs and voice outputs according to a consistent logic.

For example, should the assistant be an expert on a certain topic, or can it sometimes not have an answer ready and search a database or forward the question to a human in the form of a ticket? Is it important that the assistant is distant, or should it express a lot of empathy?

That we design the dialogs and voice outputs of the assistant to be coherent is important because people are very sensitive to inconsistencies in conversations. When we communicate with a digital assistant and the dialog logic or voice outputs are not consistently coherent we then have the feeling that we are dealing with a kind of “split personality” in the assistant and find this unpleasant.

So, character is important and should always be defined.

What character traits we should define for an assistant really depends on the particular goal of the assistant, i.e., what kind of task performance it should assist users with and what kind of character would be useful in doing so.

But how far we go with the design and implementation of the character can have a positive or negative effect on the perception of the end users.

If the assistant appears human to a certain extent, for example, if it expresses empathy, uses natural language or humor, then this increases the trust of the users (Smestad, 2018).

At a certain point, however, this tips back into the “uncanny valley” (Mori, 1970) and the effect turns negative (read more about the uncanny valley effect here: Impact of natural and/or human design of conversational AI. – The Psychology of Conversational AI (psyconai.com)).

Furthermore, even if we define essential character traits for the assistant, the assistant must at no time give the impression that it is a human being or a being with consciousness. There are many ways in dialog design to create this important transparency, starting with the first prompt in which the assistant introduces itself. For a more complex assistant, it is also worthwhile to provide factual answers to questions about consciousness and the like.

So, my recommendation would be to design a digital assistant not as human as possible – even if we have to set some character traits. However, the challenge now is to design the interaction with a digital assistant in a “natural” way.

This means, for example, that we should take up as many rules of human communication as possible in the dialog design and formulate them in natural language. Even if we have to avoid the uncanny valley, users also view it negatively when an assistant does not fulfill their expectations of human conversation.

Cohen, M. H., Cohen, M. H., Giangola, J. P., & Balogh, J. (2004). Voice user interface design. Addison-Wesley Professional.

Lee, E. J. (2010). The more humanlike, the better? How speech type and users’ cognitive style affect social responses to computers. Computers in Human Behavior, 26(4), 665-672.

Mori, M. (1970). Bukimi no tani [the uncanny valley]. Energy, 7, 33-35.

Reeves, B., & Nass, C. (1996). The media equation: How people treat computers, television, and new media like real people. Cambridge, UK, 10, 236605.

Smestad, T. L. (2018). Personality Matters! Improving The User Experience of Chatbot Interfaces-Personality provides a stable pattern to guide the design and behaviour of conversational agents (Master’s thesis, NTNU).

22. June 2022

Allgemein

Character
Voice control is natural, intuitive, fast and efficient – provided that the conversational AI is well designed.

But the great charm of voice control is also that we can use it when we are involved in a parallel task and cannot use our hands, or do not want to avert our gaze.

In fact, I’m often asked if using voice control, or having a conversation in general, isn’t a critical additional distraction in safety-critical situations. A typical example of such a situation is voice control while driving.

Of course, I dealt with this question very intensively during my time as conversation design lead at BMW.

So what happens in our brain when we perform several activities simultaneously?

The capacity of the so-called working memory, i.e. the part of the memory that allows us to store and manipulate information in the short term, is very limited (more on this topic here).

But what does this mean for the use of voice control as a “secondary task” while we are involved in a safety-critical “primary task”?

When we drive a vehicle, we feel that it is easier for us to make a parallel call, for example, than to write a message on our smartphone. This is because driving and texting are primarily visual tasks, while talking on the phone is primarily an auditory task.

Nevertheless, all these tasks naturally cause mental workload.

According to the theory of multiple resources (Wickens, 1989) tasks using different resources interfere less with each other than tasks using the same resources.

It assumes that total cognitive capacity is composed of different individual capacities that are independent of each other.

Visual and auditory channels use separate resources, both in the senses (eyes versus ears) and in the brain itself (auditory versus visual cortex). Driving and talking also require different kinds of processing methods (= “codes”, spatial versus verbal).

Because of the “code separation”, driving, a visual-spatial manual task, can be “time-shared” with conversation, a verbal task, with little dual task decrement.

Sounds totally confusing?!

Yes, it is, this theory is complex and I save myself here the execution of the further dimensions of the multiple resource theory model of time sharing and workload.

Are there also exciting studies on the subject, which are not so theoretical and prove the principle?!

Absolutely, glad you ask! 😉

There is a highly interesting dataset of real driving data – The second Strategic Highway Research Program (SHRP 2). It is the largest study of its kind, including data from 50 million vehicle miles and 5.4 million trips. Data from instrumented vehicles was collected from more than 3,500 participants during a 3-year period in the U.S.

Recorded data consisted of driving parameters such as speed, acceleration, and braking, all vehicle controls, forward radar, and lane position. In addition, video views forward, to the rear, on the driver’s face, and on the dashboard were captured.

Based on this exceptional data set, consisting of everyday driving situations, many different studies have been conducted. These came to the following results, among others:

✓ Visually distracting secondary tasks, such as texting or eating, are associated with driving errors and accidents.

✓ Cognitive distractions, such as talking on the phone or with a passenger, do not have a detrimental effect on driving performance. (Not even when this cognitive distraction occurs in combination with the strong emotion of anger, such as during an argument!)

✓ Cognitive distraction is often observed when drivers are tired and even has a protective effect on accidents! (Drivers deliberately distract themselves by making phone calls, for example, to keep themselves awake.)

So, to sum up, voice control is an ideal operating modality for situations where the eyes and hands are already tied, especially when driving.

Wickens, C.D. (1989). Processing resources in attention. In D. Holding (Ed.), Human skills (p. 77-105). New York: Wiley.

9. June 2022

Allgemein
How long may a voice output be or what is the maximum amount of information it should contain?

This is not an easy question to answer, because it depends on a number of different variables.

Let’s take a brief look at the human brain. To be able to follow a conversation, we primarily use a part of the memory called working memory.

This is a structure for short-term storage and manipulation of information. Its two key characteristics are a very limited capacity as well as a fragility of storage.

But what does limited mean in this context?

One way this has been studied is by dictating a series of random numbers to subjects and then asking them to repeat them in the correct order.

It was found that people can remember 7 ± 2 units, whereby these units can consist of numbers, letters or words.

Miller coined the “magical number 7 plus / minus 2” in 1956, according to which the working memory can hold 7 ± 2 “chunks”. Such a chunk can be, for example, a meaningful combination of letters, such as #IBM or a whole sentence, e.g., “I like to eat cookies”.

In order to access information in working memory, we need to “keep it active”, but we tend not to do that in a normal conversation. Thus it comes that when you tell something and jump from thought to thought, you suddenly have to ask yourself – “Where were we?”

So, what does all of this mean?

Does it mean that every voice output can include up to 7 chunks of information?

The short answer is – ideally not. This is the case because people do not try to memorize parts of a conversation. Since our working memory is very fragile, it makes sense to keep speech output as short as possible and to limit it to the most essential.

Nevertheless, dialog concepts that contain longer prompts can work well.

Why is that?

Because the dialog and prompt design must strongly depend on what the goal of the user’s respective intent is and in what kind of situation or context the intent is expressed. In numerous studies, I have seen how goal-oriented customers are when it comes to voice prompts and how they evaluate them. Especially with such a natural operating modality as speech, a particularly fast and efficient goal achievement is expected. However, if the customer’s goal is, for example, a better understanding of a function and we design a step-by-step tutorial, then much longer prompts can also work well, because the customer expects to get more information.

However, especially longer prompts in dialogues should always be validated in a study.

When I design a dialog I often imagine how a human being, for example a helpful passenger while driving or an agent would formulate the statement. We humans are so experienced in using language and depend so much on functioning communication that we usually formulate very appropriate “voice outputs”. 😉

More on the topic in this video.

9. June 2022

Allgemein
How should we design a digital assistant? As natural as possible? As human as possible?

This is probably the question that I have discussed most frequently with colleagues and experts over the past few years.

Because even if it seems simple to answer for many at first glance, this question is extremely complex, as it affects numerous components of the design of a digital assistant.

In fact, it is also not so easy to cleanly separate the terms “natural” versus “human”, as they partly overlap.

Many studies suggest that people want the interactions with a digital assistant to be as natural as possible.

– Response time –

For example, they expect the time it takes the assistant to answer a question not to exceed the latency of a normal human response.

Why?

First, people usually write or speak with a digital assistant because they want to achieve a specific goal, as quickly as possible. Unnecessary waiting time is therefore viewed very negatively in studies.

At the same time, there is an interesting correlation in human communication that negative responses to invitations, requests or offers, for example, are more likely to be given with a delay. Conversation analysts talk of them as “dispreferred” (Bögels et al., 2015).

This correlation could also contribute to the negative evaluation of delays in conversations with assistants.

So, in summary, do we have to say here that people want an assistant’s response behavior to be as natural and human as possible?

Yes and no.

Yes, the latency should not be longer than with a response from a human or, if the system requires a longer processing time, it should be bridged in a meaningful way, for example by an explanation (“Just a moment, I need to check this briefly in my knowledge database.”).

At the same time, however, people evaluate it negatively when we artificially lengthen response times of a digital assistant to make them seem more human-like. This is particularly true for experienced users, as they know that a digital assistant can respond more quickly (Gnewuch et al., 2022).

This effect certainly has several causes. Firstly, people simply do not like to be deceived. In addition, there is also the so-called uncanny valley effect, which can have a negative impact on many components of an assistant.

– uncanny valley –

What is the uncanny valley? In 1970, Masahiro Mori, a robotics professor at the Tokyo Institute of Technology, wrote a paper about how he imagined people’s reactions to robots that looked and behaved almost like humans. Specifically, he hypothesized that a person’s reaction to a human-like robot would abruptly switch from empathy to disgust when it approached but did not achieve a lifelike appearance.

That is, as robots become more human-like in appearance and movement, people’s emotional reactions to them become more positive. This trend continues until a certain point is reached, at which point the emotional reactions quickly become negative.

This hypothesis has been studied by many scientists, especially in relation to the appearance and non-verbal behaviors of robots.

However, recent studies also shed light on highly interesting relationships affecting the design of digital assistants.

These concern, for example, the TTS of speech outputs, digital visual representation, and even the evaluation of machines’ decisions as moral or immoral.

– TTS of speech output –

Text-to-speech (TTS) technology synthesizes speech from text. Although TTS does not yet replicate the quality of recorded human speech, it has improved a great deal in recent years.

Neural TTS has recently gained attention as a method of creating realistic sounding synthetic voices in contrast with standard concatenative synthetic voices. It uses deep neural networks as a postfiltering step to create models that more accurately mimic human voices.

In a recent study, Do and colleagues (2022) conducted an experiment in which they analyzed how people evaluated standard TTS, Neural TTS and human speech. They found that the virtual human was perceived as significantly less trustworthy, if neural TTS compared to human speech was used, but there was no significant difference between Human speech and Standard TTS.

This means that it may be a better choice to use a normal “lower quality” TTS, which is less close to a human recording, than an optimized TTS, which is close to a human but still not quite and here may fall into the uncanny valley.

– Visual representation with an avatar –

Many would like to give their digital assistant a more human face or appearance, to make it more approachable. But also in this context, one should be aware of the uncanny valley effect.

Ciechanowski and colleagues (2019), for example, conducted an experiment to understand how people would evaluate interaction with a chatbot with an animated avatar and without a visual representation (simple text chatbot).

They found that participants were experiencing lesser uncanny effects (“weirdness” or discomfort) and less negative affect in cooperation with a simpler text chatbot than with the more complex, animated avatar chatbot. The simple chatbot also induced less intense psychophysiological reactions.

– Moral uncanny valley –

Artificial intelligence is playing a role in more and more areas of daily life. In the process, machines also increasingly have to make decisions that people evaluate morally.

At the same time, little is known about how the appearance of machines influences the moral evaluation of their decisions.

Laakasuo and colleagues (2021) have conducted exciting experiments on this, looking at the interplay of the uncanny valley effect and moral psychology.

They investigated whether people evaluate identical moral decisions made by robots differently depending on the robot’s appearance. Participants evaluated either deontological (“rule-based”) or utilitarian (“consequence-based”) moral decisions made by different robots.

Their results provide preliminary evidence that people evaluate moral decisions made by robots that resemble humans as less moral than the same moral decisions made by humans or non-human robots: a moral uncanny valley effect.

– To sum up –

Should we try to design digital assistants to be as human as possible? Quite clearly no.

If I’ve learned anything from the many studies I’ve conducted over the past few years, it’s that people interacting with digital assistants are primarily interested in achieving their goals quickly and in having a very transparent experience.

Transparency is, in sum, a very relevant topic for dialog design and could take up an entire article in itself. But what is important in this context is simply that the assistant should be transparent in the sense that it is not human.

The art now in design is to make the assistant not appear human, but still create an interaction that is as natural as possible for the user. After all, people still prefer natural speech in interaction, for example, compared to abbreviated “robot speech.”

Disclaimer: Please note that this article is not a conclusive scientific consideration of these issues. As in any other field of psychological research, there are always studies that reach different conclusions, also depending on the measured variables, their operationalization and their weighting for the interpretation of the results.

Bögels, S., Kendrick, K. H., & Levinson, S. C. (2015). Never say no… How the brain interprets the pregnant pause in conversation. PloS one, 10(12), e0145474.

Ciechanowski, L., Przegalinska, A., Magnuski, M., & Gloor, P. (2019). In the shades of the uncanny valley: An experimental study of human–chatbot interaction. Future Generation Computer Systems, 92, 539-548.

Do, T. D., McMahan, R. P., & Wisniewski, P. J. (2022). A New Uncanny Valley? The Effects of Speech Fidelity and Human Listener Gender on Social Perceptions of a Virtual-Human Speaker. In CHI Conference on Human Factors in Computing Systems (pp. 1-11).

Gnewuch, U., Morana, S., Adam, M. T., & Maedche, A. (2022). Opposing Effects of Response Time in Human–Chatbot Interaction. Business & Information Systems Engineering, 1-19.

Laakasuo, M., Palomäki, J., & Köbis, N. (2021). Moral Uncanny Valley: A robot’s appearance moderates how its decisions are judged. International Journal of Social Robotics, 13(7), 1679-1688.

Mori, M. (1970). “The uncanny valley,” Energy, vol. 7, no. 4, pp. 33–35.

9. June 2022

Allgemein

The Psychology of Conversational AI

What makes good conversational AI and how does it interact with the human psyche?

Discover the latest articles.