Humans as Robots

An older version of the wearable system incorporated LEDs in order to perform active segmentation of objects that the wearer held up for visual inspection. The following text describes an application of the wearable system, in which the wearable behaves like a creature with its own agenda to learn from a cooperative wearer.

Human intelligence relies on a wealth of commonsense acquired from a lifetime of experience. In order to achieve the long term goals of artificial human intelligence, researchers must find ways to endow machines with this type of commonsense.

Humanoid robots could serve as a direct approach to the acquisition of this type of competence, since a sufficiently sophisticated humanoid robot would be able to experience much of the world in the same way as humans. Currently, however, humanoid robots have very limited experience with the world due to obstacles ranging from mechanical design to social constraints on the use of autonomous robots.

Wearable computing systems have the potential to measure a great deal of the sensory input and physical output of a person as he or she experiences everyday activities. Much can be learned through passive observation of these measurements. However, if we can also find ways for the wearable system to strongly influence the behavior of the person wearing the system, many learning tasks can be made easier.

diagram of combined humanoid platform (subsumption architecture)

We designed Duo, a wearable creature that works with a cooperative human in order to learn about everyday objects in the world. The wearable learns about the world by watching and sometimes making requests of the wearer as he goes through activities in the day. By using the same sensory input as the person and co-opting his output behaviors, the wearable creature serves as a top layer of control in a subsumption architecture with the human serving as a powerful mechanical and computational infrastructure. A diagram of the subsumption architecture for this human/wearable application is shown above.

As an initial exploration into this class of wearable applications, we created a wearable system that attempts to learn about the everyday objects with which the wearer interacts during everyday activities. The creature uses a camera to see what the wearer is seeing, and orientation sensors to estimate the kinematic configuration of the wearer's head and dominant arm. The creature also serves as a high level controller that attempts to co-opt the wearer's behaviors by requesting actions through headphones. For example, the creature was able to request that the wearer look at an object that the wearer was manipulating, in order to see it better and segment it using the LED array. Likewise Duo could ask that the wearer keep his head still, in order to make perception easier. We hypothesize that a broad array of actions useful for learning could be successfully prompted by speech from Duo. In the future, an application such as this could ask the wearer to repeat an action by uttering, ``do that again!'', which should help the creature segment the activity into meaningful parts. More generally, by requesting actions the wearable creature could test hypotheses it has made about actions and their effect in the world.



example segmentations from Duo
This figure shows two segmentations of common manipulable objects by Duo. When Duo detects that the wearer has reached for an object, Duo requests that the person look at the object via speech through the headphones. When the person holds up the object to look at it, Duo flashes the LEDs in order to produce the segmentations shown in this figure. The first column shows Duo's view before the LED flash and the second column shows the view during the LED flash. The third column shows the difference between the flashed and non-flashed images. The fourth column shows the object and hand mask produced by thresholding this difference. The final column shows the masks applied to the images to segment the hands holding the objects in the images.


These behaviors work together with a cooperative human to acquire high-quality segmentations of everyday manipulable objects used by the person wearing the system. When Duo detects that the wearer has reached for an object it asks the wearer to look at it with speech through the headphones. While looking at the object Duo flashes the LED array in order to segment the hand and object, which are in the foreground, from the rest of the world in the background. While looking at the object, Duo also monitors the wearer's head movements. If the wearer's head moves significantly, Duo requests that the wearer keep his head still. Future work for a system of this nature could use methods we have developed to segment images and detect, track, and recognize these segments, while segmenting and recognizing actions of the wearer other than reaching.

This research makes progress towards a viable system for the acquisition of commonsense related to everyday human activities. Future creatures could learn to control a set of common behaviors performed by a cooperative human and learn to relate common action patterns to the visual appearance of objects and to the observed changes in the world.