PhD student Lucy Chai was recently recognized for her groundbreaking work with an Adobe Research Fellowship. Image credit: Monica Agrawal
Jane Halpern | Department of Electrical Engineering and Computer Science
Lucy Chai is a 3rd year PhD student working with Philip Isola’s research group on visual computing. Recently, Chai was awarded the prestigious Adobe Research Fellowship for her work in image synthesis, specifically the use of AI models to generate images. When not at work in the lab, Chai is an active member of the TechMasters swim team. We met Chai over Zoom to learn more about her cutting-edge research.
Thanks so much for sitting down to chat with us! What big question does your research aim to solve?
My research direction has two branches: first, how can we control the content we create? Recently there’s been an explosion in creating artificial images from scratch: your input is a random number from a distribution, and your output is an image. These AI-created images have gotten really good; the image is very realistic. But what if you want to specify a particular attribute—for instance, a tree in the scene, or a cloudy sky? With random numbers, there’s no way to get that.
So, two of my projects have been exploring how to demonstrate different transformations in generated content, and how to create combinations of what we generate. For example, combining long hair with this color shirt and that kind of background, or selecting attributes of different images to combine against the same background.
What kinds of applications might that research have?
What I really like about these models is that these controls are very coarse and you don’t have to be precise. If you can imagine taking different parts of an image and combining them like a collage, there will be parts everywhere, and specific bits may not align. Your model is trained on real images, and the model will unify everything together into a consistent image. When you’re doing these manipulations, you don’t have to be precise. But you are also subject to the model’s biases. Say we change the color of a generated scene; you have an image of a volcano, but you want to brighten or darken the scene. But when you darken the scene, suddenly your model shows the volcano exploding. That’s because of the sets of images the model is learning from. If the background is dark, the volcano is exploding. By learning those biases, you can get these coarse semi-automatic reasoning controls, where all you are saying is “make this scene darker” but the AI is also learning that when it gets dark, the volcano explodes.
These examples of generated images reflect the model's learning: for example, that waves may shift in direction, but that horizons remain relatively stable. Image courtesy Lucy Chai.
How does this work relate to AI-generated faces, for instance the images on This Person Does Not Exist?
This is the other branch of my research, and it’s definitely a concern. Current AI models have gotten really good at synthesizing fake faces, as they are naturally structured and can be aligned to reduce domain variability. I have a project that looks at the artifacts of images generated from these models. When we’re trying to determine if an image is real or if it was generated by one of these models, what parts of the image are the most obvious “giveaways”? We break the image down into patches and take a patch-based classification approach. If the hair is the most obvious part, the patch centered on the hair will be highly predictive.
The other application of this approach is to try to see what generalizes different models. There are always new models coming out. You’re not going to be able to capture everything in one go. We investigate what parts of images will a classifier pick identify, that will help it distinguish between real and fake, even if it was synthesized by a different generative model.
Here, Chai has extracted selected parts of various images and challenged her model to attempt to combine those parts in a realistic composite. Image courtesy Lucy Chai.
After spending hundreds, if not thousands, of hours looking at generated images, how have your own fake-spotting skills developed? Can you be fooled?
I don’t want to guarantee anything, but if you look at enough of these generated images, you know where to look for common artifacts. Simple textures are the hardest. What is easier to distinguish is complex patterns. The real world has so much structured complexity, and the models are always getting better.
At some point, will AI models outstrip our ability to tell real from fake images?
I think it can. AI tech is always amazing, though I don’t know if it’ll get to the point where it’s perfect.
What’s next for you? Do you want to stick with theory, pursue applications, something else?
In research, I like the direction of approximate editing. As a human, it’s hard to be precise. It’s much easier to say, “I kind of want this”, and have the model work it out. There’s a tradeoff, though. If you give the model that flexibility, it might not give you precisely what you want. That’s the downside, so you’re always balancing the precision of your input versus the model’s imagination.
After earning my PhD, though, I think I want to go into industry. These models are definitely not a finished product, but are a demonstration of what a model learns, and how we could leverage that to do certain tasks.