Doctoral Thesis: SLAM-aware, Self-Supervised Perception in Mobile Robots


Event Speaker: 

Sudeep Pillai

Event Location: 

32-G449 KIVA

Event Date/Time: 

Tuesday, August 29, 2017 - 11:00am


Simultaneous Localization and Mapping (SLAM) is a fundamental
capability in mobile robots, and has been typically considered in the
context of aiding mapping and navigation tasks. In this thesis, we
advocate for the use of SLAM as a supervisory signal to further the
perceptual capabilities in robots. Through the concept of
SLAM-supported object recognition, we develop the ability for robots
equipped with a single camera to be able to leverage their
SLAM-awareness (via Monocular Visual-SLAM) to better inform object
recognition within its immediate environment. Additionally, by
maintaining a spatially-cognizant view of the world, we find our
SLAM-aware approach to be particularly amenable to few-shot object
learning. We show that a SLAM-aware, few-shot object learning strategy
can be especially advantageous to mobile robots, and is able to learn
object detectors from a reduced set of training examples.

Implicit to realizing modern visual-SLAM systems is its choice of map
representation. It is imperative that the map representation is
crucially utilized by multiple components in the robot's
decision-making stack, while it is constantly optimized as more
measurements are available. Motivated by the need for a unified map
representation in vision-based mapping, navigation and planning, we
develop an iterative and high-performance mesh-reconstruction
algorithm for stereo imagery. We envision that in the future, these
tunable mesh representations can potentially enable robots to quickly
reconstruct their immediate surroundings while being able to directly
plan in them and maneuver at high-speeds.

While most visual-SLAM front-ends explicitly encode
application-specific constraints for accurate and robust operation, we
advocate for an automated solution to developing these systems. By
bootstrapping the robot's ability to perform GPS-aided SLAM, we
develop a self-supervised visual-SLAM front-end capable of performing
visual ego-motion, and vision-based loop-closure recognition in mobile
robots. We propose a novel, generative model solution that it is able
to predict ego-motion estimates from optical flow, while also allowing
for the prediction of induced scene flow conditioned on the
ego-motion. Following a similar bootstrapped learning strategy, we
explore the ability to self-supervise place recognition in mobile
robots and cast it as a metric learning problem, with a GPS-aided SLAM
solution providing the relevant supervision. Furthermore, we show that
the newly learned embedding can be particularly powerful in
discriminating visual scene instances from each other for the purpose
of loop-closure detection. We envision that such self-supervised
solutions to vision-based task learning will have far-reaching
implications in several domains, especially facilitating life-long
learning in autonomous systems.

Thesis Supervisor: John J. Leonard

Thesis Committee: Antonio Torralba, Leslie Kaelbling, and Nicholas Roy