A Model for Generating Socially-Appropriate Deictic Behaviors towards People

Pointing behaviors are essential in enabling social robots to communicate about a particular object, person, or space. Yet, pointing to a person can be considered rude in many cultures, and as robots collaborate with humans in increasingly diverse environments, they will need to effectively refer to people in a socially-appropriate way. We confirmed in an empirical study that although people would point precisely to an object to indicate where it is, they were reluctant to do so when pointing to another person. We propose a model for selecting utterances and pointing behaviors towards people in terms of a balance between understandability and social appropriateness. Calibrating our proposed model based on empirical human behavior, we developed a system able to autonomously select among six deictic behaviors and execute them on a humanoid robot. We evaluated the system in an experiment in a shopping mall, and the results show that the robot’s deictic behavior was perceived by both the listener and the referent as more polite, more natural, and better overall when using our model  as compared with a model considering understandability alone.

The importance of natural and humanlike human-robot interaction is gaining more attention as robots gain presence in museums [2-4], classrooms [5], and elderly care facilities [6, 7]. In order to facilitate natural and intuitive communication, humanlike spoken, locomotive [8], and gestural behaviors are being developed for robots, and one important area of focus is in deictic gestures, such as pointing. Several studies in human-robot interaction have focused on generating human-like multimodal referring acts using both speech and gesture for objects [9-13] and space [14, 15].

Our study focuses on a method for generating behaviors for a robot to point to a person. There are important differences in the way someone gestures towards objects and the way someone gestures towards a fellow person. When pointing to people, it is often considered more appropriate to gesture casually to them rather than using a very obvious pointing gesture, i.e. with an extended index finger. However, in most situations there would be no reason not to use a clear and precise pointing gesture when identifying an object.
As social human-robot interactions become more complex, it will be important to consider the social appropriateness of a pointing gesture within the context of the conversation. For example, if an elder-care provider is consulting with another practitioner about the health condition of a particular senior person, he would probably discreetly point out that person, using a subtle pointing gesture, in order to reduce the risk of the referent becoming aware and avoid causing anxiety to the referent. In such a scenario, if a robot directly singled out the individual when discussing a sensitive topic (i.e. a “closed” conversation), the robot would probably be perceived as socially-inappropriate. It would be more appropriate for the robot to discreetly identify the referent, even if it meant being less clear to its listener about the referent’s identity. However, if the conversation was not of a sensitive nature, and the topic being discussed is neutral or positive (i.e. an “open” conversation), the social consequences would be less severe, and it might be acceptable for the robot to be more obvious about identifying the referent.
Existing models for generating deictic behaviors in robots are typically designed for referring to objects, and thus do not consider this element of social appropriateness. In this study, we present a model for generating socially-appropriate deictic behaviors for pointing to people.
First, we present an empirical study of human pointing behavior, in which we confirm that people usually do not use precise pointing gestures, that is, they typically do not use the index finger to directly point towards another person, and that this phenomenon becomes even more pronounced in the case of private, or “closed,” conversation.
We then propose a generative model for deictic behaviors, based on the idea of a balance between understandability and social appropriateness: more precise pointing gestures can increase understandability, but  they can also be socially inappropriate. Based on this concept and the data from our human behavior observations, we have developed a model enabling a robot to reproduce human deictic behavior towards people.
Finally, we describe our implementation of this model in a real robot system and present results from an experiment conducted with a robot in a shopping mall, showing that people evaluated the robot’s behaviors as more natural and polite when social appropriateness was considered in behavior selection.

Studies of Human Pointing Behavior
According to Kendon, the intention of precise pointing is to single out an object which is to be attended to as a particular individual object [16]. He categorized this type of pointing as the Index Finger Extended, for which not only the index finger, but almost any extensible body part or held object can be used. The idea that index finger pointing singles out a particular entity is a well-established idea in human science literature, and it provides a useful basis for our categorization.
Some studies have examined the use of reference terms for people. In such studies, the focus was mainly on generating a referring expression (i.e. “This is the coach”) to single out someone as an individual person [17-19]. Accordingly, we also consider verbal descriptive terms as part of our model for generating deictic behavior.

Human-Robot Interaction
Various generative robot behaviors first look at how humans behave as the basis of behavior design. For example, Semel et al. developed and verified a control system for humanoid bipedal locomotion that was biologically based on human gait cycles [8]. However, the mechanism that drives us to act a certain way may not be obvious to us. Hence, various studies use data-driven methods to extract the underlying mechanisms that govern our behaviors, such as recognizing our emotional states through ECG data [20], or identifying features that uniquely define us through EEG data [21]. In our work, we first observe human deictic behaviors through data collection, and then we incorporate the main factors that were identified in our analysis into our model.
Similar to Kendon’s work of index finger pointing to single out an object, studies have attempted to model the idea of pointing as a way to resolve ambiguity. Bangester et al. focused on the use of full pointing (arm fully extended) and partial pointing (elbow bent) by varying the number of pictures in an array to manipulate the  ambiguity of a reference [22]. We will combine this idea of resolving ambiguity with an additional politeness factor that applies when pointing to people.
Some studies in human-robot interaction have focused on generating human-like multimodal referring acts using both speech and gesture for objects [9-12], and space [14, 15]. Brooks and Breazeal [23] describe a framework for multimodally referring to objects using a combination of deictic gesture, speech, and spatial knowledge. Schultz et al. focused on spatial reference for a robot using perspective taking [24]. In these studies, the robot points to a static object in the environment and produces an appropriate deictic behavior that indicates where the target is. We will also study multimodal behaviors in human-robot interaction, but with a focus on the social aspects of pointing to people.


Phoebe Liu, Dylan F. Glas, Takayuki Kanda, Member, IEEE, Hiroshi Ishiguro, Norihiro Hagita, Senior Member, IEEE



1 This paper is an extended version of our conference paper [1] with integrated technical details, additional discussions, expanded explanations, and supplementary analysis of the experiment.



International Journal of Social Robotics (preprint)
The final publication is available at Springer via

