(BEING CONTINUED FROM 11/07/16)
We collected data from observations of real human deictic behavior so we could generate a model enabling a robot to point naturally to people. Since pointing to objects has been explored extensively in other research, we chose to focus on ways in which pointing behaviors vary when pointing to people. In particular, we were interested in examining three factors:
Object vs. person: As discussed in the introduction, we expected that people would point precisely to objects but less precisely to people.
Open vs. closed: We expected that people would use less obvious gestures in “closed” conversation, e.g. talking about someone in a negative way, than in “open” conversation.
Known vs. unknown: We wondered whether people’s behavior would be different if they already knew the referent, such as in the case where saying their name would be enough to identify the referent without ambiguity.
We conducted the data collection in a shopping mall, as shown in Fig. 1(a), with 17 participants (11 female, 6 male, average 23.7 years old), who were paid. We asked the participants to role-play as customers in the shopping mall. An experimenter asked the participant’s opinions about other products or visitors in the mall, and the participant freely answered using deictic behaviors. The participants were not explicitly instructed to use deictic behaviors, but rather instructed to “indicate” who the referent was.
We measured the behavior of the participants under 5 scenarios, chosen to measure the factors described above. The scenarios were defined as follows:
Object: Referring to a product in the shopping mall that does not belong to either the participant or the confederate (e.g. “Which of these cellphones do you think looks better?”).
Open/Known: Referring to a mutual friend (one of two other acquaintances) in an open conversation. (e.g. “With which of our friends did you take the same bus to the mall?”)
Open/Unknown: Referring to a random, unknown customer in an open conversation (e.g. “Which person did you see at the train station yesterday?”)
Closed/Known: Referring to a mutual friend (one of two other acquaintances) in a closed conversation, such as gossiping negatively. (e.g. “Which of our friends do you think has no fashion sense?”)
Closed/Unknown: Referring to a random, unknown customer in a closed conversation (e.g. “Which person do you think looks unfriendly?”)
Each scenario consisted of 6 pre-determined questions, which were counter-balanced. Before the experiment, we had a short ice-breaker session to familiarize the participant with two additional experimenters, who were role-playing as the acquaintances in the “known” scenarios. The two acquaintances stood at different locations for each question. In the “unknown” scenarios, the participants were instructed to refer to actual customers in the shopping mall.
Fig. 1. (a) The shopping mall in which the data collection was performed
Video of each participant’s behaviors was recorded, and as we expected that positions of surrounding people might affect the speaker’s deictic behavior (i.e., identifying a referent among many customers would be more difficult than when only a few customers were present), we used a human tracking system based on 2D laser range finders (LRF)  to capture the positions of the people in the environment. Fig. 1 (b) shows the map of the environment in which the data collection was conducted.
1(b) Map of the data collection environment
The degree of crowding could not be explicitly controlled since the experiment was conducted in a shopping mall. However, all trials were conducted under similar conditions during weekday mornings and afternoons, with an average of 10.46 people present in the environment across all trials.
For each question, the speaker’s pointing type and use of a verbal descriptive term were coded and categorized from the recorded videos, as explained below.
Categorization of Pointing Types
We classified pointing gestures into three categories (see Fig. 2): “gaze only”, “casual pointing”, and “precise pointing”. “Gaze only” was defined as when the speaker only gazes in the direction of the referent, without the use of any other pointing gestures. “Casual pointing” was coded as when the arm was only partially extended. These also corresponded with the “Open Hand Neutral”, “Open Hand Prone”, and “Open Hand Oblique” pointing gestures as defined by Kendon. “Precise pointing” was defined as when the speaker’s arm and index finger were fully extended, based on Kendon’s definition.
There was a range of variation in the amount of extension of the upper arm and the forearm among participants, so for simplicity, we categorized the pointing type as precise pointing only when the arm and the index finger were fully extended. All other pointing was coded as casual pointing.
Categorization of Descriptive Terms
We analyzed the video to identify whether people used a verbal descriptive term. Here, a “descriptive term” is defined as an utterance aside from the referent’s name that uniquely singles out the referent from other people, e.g. based on relative location (“the person in front of the coffee shop”) or a visible feature (“the person in the blue shirt”).
If only the referent’s name was used, it was classified as “name only”. If the participant used only a general deictic reference term (“that person”), it was classified as “no descriptive term”, since terms like “this” or “that” may not uniquely single out the referent among surrounding people .
Results and Analysis
For each of the 5 scenarios, a total of 102 reference behaviors were observed (6 questions for each of the 17 participants). Using the recorded videos, an experimenter annotated the pointing behaviors and whether descriptive terms were used by the participants in each trial. This was used for the tabulation of Table I. The experimenter also noted down the referent’s position at the time when the speaker made the reference behavior, as well as how long it took for the speaker to make the reference behavior. We noticed that in addition to the use of deictic pointing behaviors to describe the referent, some speakers also used other techniques of representation, such as using gesture to act out putting on a jacket to describe a referent wearing a jacket. These types of gestures were only observed a few times among the participants, and were not a universal phenomenon. In this paper, we avoid these special cases and focus only on deictic language and referential gestures.
The relative frequencies of behaviors for each scenario are shown in Table I, with the most frequently used behaviors in each scenario highlighted in bold.
(TO BE CONTINUED)
Phoebe Liu, Dylan F. Glas, Takayuki Kanda, Member, IEEE, Hiroshi Ishiguro, Norihiro Hagita, Senior Member, IEEE
 P. Liu, D. F. Glas, T. Kanda, H. Ishiguro, and N. Hagita, “It’s not polite to point: generating socially-appropriate deictic behaviors towards people,” in 8th ACM/IEEE International Conference on Human-Robot Interaction, 2013, pp. 267-274.
 M. Bennewitz, F. Faber, D. Joho, M. Schreiber, and S. Behnke, “Towards a humanoid museum guide robot that interacts with multiple persons,” in Humanoid Robots, 2005 5th IEEE-RAS International Conference on, 2005, pp. 418-423.
 M. Shiomi, T. Kanda, H. Ishiguro, and N. Hagita, “Interactive Humanoid Robots for a Science Museum,” Intelligent Systems, IEEE, vol. 22, pp. 25-32, 2007.
 M. Nieuwenhuisen and S. Behnke, “Human-like interaction skills for the mobile communication robot robotinho,” International Journal of Social Robotics, vol. 5, pp. 549-561, 2013.
 T. Kanda, R. Sato, N. Saiwaki, and H. Ishiguro, “A Two-Month Field Trial in an Elementary School for Long-Term Human-Robot Interaction,” Robotics, IEEE Transactions on, vol. 23, pp. 962-971, 2007.
 K. Berns and S. A. Mehdi, “Use of an Autonomous Mobile Robot for Elderly Care,” in Advanced Technologies for Enhancing Quality of Life (AT-EQUAL), 2010, 2010, pp. 121-126.
 A. M. Sabelli, T. Kanda, and N. Hagita, “A conversational robot in an elderly care center: An ethnographic study,” in Human-Robot Interaction (HRI), 2011 6th ACM/IEEE International Conference on, 2011, pp. 37-44.
 V. B. Semwal, S. A. Katiyar, R. Chakraborty, and G. Nandi, “Biologically-inspired push recovery capable bipedal locomotion modeling through hybrid automata,” Robotics and Autonomous Systems, vol. 70, pp. 181-190, 2015.
 O. Sugiyama, T. Kanda, M. Imai, H. Ishiguro, N. Hagita, and Y. Anzai, “Humanlike conversation with gestures and verbal cues based on a three-layer attention-drawing model,” Connection Science, vol. 18, pp. 379-402, 2006.
 J. Schmidt, N. Hofemann, A. Haasch, J. Fritsch, and G. Sagerer, “Interacting with a mobile robot: Evaluating gestural object references,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2008, pp. 3804-3809.
 S. Sakurai, E. Sato, and T. Yamaguchi, “Recognizing pointing behavior using image processing for human-robot interaction,” in Advanced intelligent mechatronics, 2007 IEEE/ASME international conference on, 2007, pp. 1-6.
 R. M. Holladay, A. D. Dragan, and S. S. Srinivasa, “Legible robot pointing,” in Robot and Human Interactive Communication, 2014 RO-MAN: The 23rd IEEE International Symposium on, 2014, pp. 217-223.
 M. Salem, S. Kopp, I. Wachsmuth, K. Rohlfing, and F. Joublin, “Generation and evaluation of communicative robot gesture,” International Journal of Social Robotics, vol. 4, pp. 201-217, 2012.
 T. Spexard, S. Li, B. Wrede, J. Fritsch, G. Sagerer, O. Booij, et al., “BIRON, where are you? Enabling a robot to learn new places in a real home environment by integrating spoken dialog and visual localization,” in Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on, 2006, pp. 934-940.