Thesis Defense: A 2D + 3D Rich Data Approach to Scene Understanding

SHARE:

Event Speaker: 

Jianxiong Xiao

Event Location: 

32-D463 (Star Room)

Event Date/Time: 

Tuesday, July 9, 2013 - 11:00am

Abstract:

On your one-minute walk from the coffee machine to your desk each
morning, you pass by dozens of scenes -- a kitchen, an elevator, your
office -- and you effortlessly recognize them and perceive their 3D
structure. But this one-minute scene-understanding problem has been an
open challenge in computer vision for decades. Recently, researchers
have come to realize that big data is critical for building
scene-understanding systems that can recognize the semantics and
reconstruct the 3D structure. In this talk, I will share my experience
in leveraging big data for scene understanding, shifting the paradigm
from 2D view-based categorization to 3D place-centric representations.

To push the traditional 2D representation to the limit, we built the
Scene Understanding (SUN) Database, a large collection of images that
exhaustively spans all scene categories. However, the lack of a "rich"
representation still significantly limits the traditional recognition
pipeline. While an image is a 2D array, the world is 3D and our eyes
see it from a viewpoint, but this is not traditionally modeled. This
paradigm shift toward rich representation also opens up new challenges
that require a new kind of big data -- data with extra descriptions,
namely rich data. Specifically, we focus on a highly valuable kind of
rich data -- multiple viewpoints in 3D -- and we build the SUN3D
database to obtain an integrated "place-centric" representation of
scenes. This novel representation with rich data opens up exciting new
opportunities for integrating scene recognition over space and for
obtaining a scene-level reconstruction of large environments. It also
has many applications such as organizing big visual data to provide
photo-realistic indoor 3D maps.

Bio: Jianxiong Xiao is a Ph.D. candidate in the Computer Science and
Artificial Intelligence Laboratory (CSAIL) at Massachusetts Institute
of Technology (MIT). Before that, he received a B.Eng. and a M.Phil.
from the Hong Kong University of Science and Technology. Starting in
September 2013, he will be an assistant professor in the Department of
Computer Science in Princeton University. His research interests are
in computer vision, with a focus on scene understanding. His work has
received the Best Student Paper Award at the European Conference on
Computer Vision (ECCV) in 2012 and Google Research Best Papers Award
for 2012, and has appeared in popular press. Jianxiong was awarded the
Google U.S./Canada Ph.D. Fellowship in Computer Vision in 2012 and MIT
CSW Best Research Award in 2011. More information can be found on his
website: http://mit.edu/jxiao.
 
Thesis Supervisor(s): Antonio Torralba