How long may a voice output be or what is the maximum amount of information it should contain?
This is not an easy question to answer, because it depends on a number of different variables.
Let’s take a brief look at the human brain. To be able to follow a conversation, we primarily use a part of the memory called working memory.
This is a structure for short-term storage and manipulation of information. Its two key characteristics are a very limited capacity as well as a fragility of storage.
But what does limited mean in this context?
One way this has been studied is by dictating a series of random numbers to subjects and then asking them to repeat them in the correct order.
It was found that people can remember 7 ± 2 units, whereby these units can consist of numbers, letters or words.
Miller coined the “magical number 7 plus / minus 2” in 1956, according to which the working memory can hold 7 ± 2 “chunks”. Such a chunk can be, for example, a meaningful combination of letters, such as #IBM or a whole sentence, e.g., “I like to eat cookies”.
In order to access information in working memory, we need to “keep it active”, but we tend not to do that in a normal conversation. Thus it comes that when you tell something and jump from thought to thought, you suddenly have to ask yourself – “Where were we?”
So, what does all of this mean?
Does it mean that every voice output can include up to 7 chunks of information?
The short answer is – ideally not. This is the case because people do not try to memorize parts of a conversation. Since our working memory is very fragile, it makes sense to keep speech output as short as possible and to limit it to the most essential.
Nevertheless, dialog concepts that contain longer prompts can work well.
Why is that?
Because the dialog and prompt design must strongly depend on what the goal of the user’s respective intent is and in what kind of situation or context the intent is expressed. In numerous studies, I have seen how goal-oriented customers are when it comes to voice prompts and how they evaluate them. Especially with such a natural operating modality as speech, a particularly fast and efficient goal achievement is expected. However, if the customer’s goal is, for example, a better understanding of a function and we design a step-by-step tutorial, then much longer prompts can also work well, because the customer expects to get more information.
However, especially longer prompts in dialogues should always be validated in a study.
When I design a dialog I often imagine how a human being, for example a helpful passenger while driving or an agent would formulate the statement. We humans are so experienced in using language and depend so much on functioning communication that we usually formulate very appropriate “voice outputs”. 😉
More on the topic in this video.