For robots to effectively interact with humans, they require the ability to learn representations of their environments that are compatible with the conceptual models used by people. Current approaches to constructing such spatial-semantic representations rely solely on traditional sensors to acquire knowledge of the environment, which restricts robots to learning limited knowledge of their local surround. In contrast, natural language descriptions allow people to share rich information about their environments with their robot partners in a flexible, efficient manner that allows robots to “observe” spatial and semantic properties that are beyond the range and capabilities of traditional sensors.
This thesis addresses the problem of fusing information contained in natural language descriptions with the robot's onboard sensors to construct spatial-semantic representations useful for human-robot interaction. The novelty of the thesis lies in its treatment of natural language as another sensor observation that can inform the robot about its environment. Toward this end, we introduce the semantic graph, a spatial-semantic representation that provides a common framework in which we integrate information that the user communicates (e.g., labels and spatial relations) with observations from the robot's sensors. We outline a semantic mapping algorithm that efficiently maintains a factored distribution over semantic graphs based upon the stream of natural language and low-level sensor information. We evaluate the algorithm’s ability to learn human-centric maps of several different environments, and demonstrate its ability to incorporate information from language descriptions to improve the metric, topological and semantic accuracy of the learned environment model. We then outline an information-theoretic approach that allows a robot to improve its representation of an environment by asking targeted questions from the user. Finally, we show how this semantic mapping algorithm enables robots to efficiently carry out natural language instructions in previously unknown environments, by taking advantage of latent information about the environment conveyed in the command. We evaluate this approach through simulation and physical experiments, and demonstrate its ability to follow navigation commands with performance comparable to that of a fully-known environment.
Thesis Supervisor: Prof. Nicholas Roy