Doctoral Thesis: Learning Low-level Priors from Images for Inference and Synthesis

Friday, April 5

10:00 am - 11:30 am

32-G449 (Patil/Kiva)

Add to Calendar

By: Prafull Sharma

Supervisors: Bill Freeman, Fredo Durand

Details

Date: Friday, April 5
Time: 10:00 am - 11:30 am
Category: Thesis Defense
Location: 32-G449 (Patil/Kiva)

Additional Location Details:

Zoom link: https://mit.zoom.us/j/91363237878

Abstract:

With the recent advancements in computer vision, scene understanding is critical for both downstream applications and photorealistic synthesis. Tasks such as image classification, semantic segmentation, and text-to-image generation parse the scene in terms of high-level properties of objects and scene. Along with understanding and creating visual media along these dimensions, it is important to understand the low-level information such as geometry, material, lighting configuration, and camera parameters. Such understanding would help us with tasks such as material acquisition, fine-grained synthesis, and robotics.

In this thesis, we discuss learning priors over low-level properties to felicitate inference of geometry, static-dynamic disentanglement, and material properties. We present a self-supervised method to construct a persistent representation for inferring geometry and appearance inferred using a single image at test time. This representation can be leveraged to infer static-dynamic disentanglement and can used for 3D-aware scene editing. We employ representations from pre-trained visual encoder for selecting similar materials in images. Additionally, we demonstrate fine-grained control over material properties for image editing using pre-trained text-to-image models. This fine-grained control is achieved by maintaining the photorealistic image ability of text-to-image models while learning control based on synthetic rendered images.

Bio:

Prafull Sharma is a PhD student advised by Prof. Bill Freeman and Prof. Fredo Durand in the Computer Vision and Graphics group at MIT CSAIL. His research focuses on representation learning grounded in physical properties using synthetic data. He is interested in leveraging the priors of pre-trained models to obtain disentangled representations grounded in the physical properties of objects. Prior to this, he worked on non-line-of-sight imaging and computational photography.

Thesis Committee:

Bill Freeman (MIT CSAIL), Fredo Durand (MIT CSAIL), Vincent Sitzmann (MIT CSAIL), Todd Zickler (Harvard)