Doctoral Thesis: Understanding Arrow of Time and Trends from Visual Temporal Structures


Event Speaker: 

Donglai Wei

Event Location: 

32-G449 KIVA

Event Date/Time: 

Friday, July 14, 2017 - 11:00am


Living in a constantly changing world, we cannot help but notice the temporal regularities of visual changes around us. These changes can be irreversible governed by physical laws, such as glass bottles broken into pieces, or influenced by design trends, such as web pages updated with large background images. In this dissertation, we study the underlying temporal structures behind these two forms of visual changes.

First, we address the question: "what visual cues are indicative of the arrow of time, i.e. the one-way direction of changes?". We train convolutional neural networks (CNN) to classify whether input videos are playing in the forward or backward direction. A model that can perform this task well can be used for video forensics and abnormal event detection, and its learned features are useful for other tasks, such as action recognition. However, CNNs can "cheat" and learn artificial signals from video production instead of the real signal. We design control experiments to systematically identify superfluous signals. After controlling these confounding factors, we analyze the visual cues learned on the large-scale Flickr video dataset, revealing both semantic and non-semantic cues for arrow of time and the photographer bias during video capture.

Next, we answer the question: "what makes 2016 web pages look like designed in 2016?". We first collect a large-scale dataset containing screenshots for top 11,000 popular web domains over 21 years (1996-2016). Trained on this database, our models investigate the design trend through visualization, colorization and evaluation. For visualization, we train variational autoencoder models to learn interpretable representation for design exploration on a 2D plane intuitively. For colorization, we use conditional generative adversarial networks to learn to re-colorize input web pages with different color palettes based on their content and year labels. Finally, we train a CNN to predict the years in which input web pages were created and identify visual elements that are ahead of or lagging behind the trend.
Thesis Supervisor: Prof. William Freeman