Doctoral Thesis: Geometric Learning for Manipulating Scenes and Objects
32-G882
By: Anthony Simeonov
Thesis Supervisors: Pulkit Agrawal and Alberto Rodriguez
Details
- Date: Friday, August 9
- Time: 10:00 am - 11:30 am
- Category: Thesis Defense
- Location: 32-G882
Additional Location Details:
Abstract:
Enabling robots to perform practical tasks in the real world requires that they be equipped with a general sense of geometric intelligence. This thesis aims to (i) understand components of geometric intelligence that are missing from current systems and (ii) propose techniques to close some of these gaps. The developed insights and techniques enable new capabilities in robotic manipulation, focusing on rigid object rearrangement tasks in real, unmodeled scenes.
First, I will discuss how the learned features of an equivariant neural field, trained offline to perform 3D reconstruction from point clouds, can be re-purposed as a representation for data-efficient manipulation with unseen objects in out-of-distribution poses. This is achieved by casting skill imitation as aligning coordinate frames detected near task-relevant object parts. The neural field representation encodes the relevant parts to detect using a few task demonstrations and supports localizing frames near the corresponding parts on new shapes. I will also present applications of our neural descriptor fields for capturing pairwise relations between object parts and chaining such relations to perform multi-step tasks. Next, I will show how rearrangement among multi-object scenes leads to additional challenges, such as generalizing to diverse scene layouts and covering the multi-modal space of rearrangement solutions. I will discuss how predicting combined object-scene 3D point clouds by de-noising relative object poses with diffusion models naturally handles these unique challenges. Finally, I will share recent results on learning closed-loop visuomotor policies that support rearrangement task execution with increased reliability and robustness by combining simulation-based reinforcement learning (sim-to-real) and 3D reconstruction (real-to-sim).
Zoom link: https://mit.zoom.us/j/94948381154