Torralba's team is making image recognition automatic

October 13, 2009

A new object recognition algorithm treats unrelated images (top and middle right) as if they were consecutive frames of video. Since it assumes that the objects in both images are the same, it tries to deform the objects in the first image until they map onto the objects in the second (above). If the objects in one of the images have already been outlined and labeled (bottom right), the algorithm can simply transfer the labels to the other image.  Courtesy of Ce Liu

Antonio Torralba, the Esther and Harold E. Edgerton associate professor of electrical engineering and computer science and EECS graduate students Ce Liu (PhD '09) and Jenny Yuen have developed an object recognition system that doesn't require any training. Even so, it still identifies objects with 50 percent greater accuracy than the best prior algorithm used for such tasks.

As noted by the MIT News Office (Oct.ber 13, 2009), most object recognition algrorithms need to be 'trained' using digital images so that they can eventually recognize the generic object (such as a car) and its typical features. But, that is only one class of objects--the whole process must be repeated for each class of objects.

Torralba and his students' new system instead uses a modified version of a so-called motion estimation algorithm--a type of algorithm common in video processing. The motion estimation algorithm determines which objects have moved from one frame to the next. So the algorithm just needs to recognize the changes in corners, edges, etc. and the change in appearance under different perspectives.

The new system essentially treats unrelated images as if they were consecutive frames in a video sequence. In checking for movement between one image and the next, the motion estimation algorithm picks out objects of the same type. Of course, the greater the resemblance of the labeled and unlabeled images, the better the algorithm works.

What is beautiful about Torralba's and his team's work is that it is enhanced by earlier efforts to compile a huge database of labeled images gathered through a web-based system called LabelMe. This system has allowed online volunteers to tag objects in digital images. This compiled data plus the website called 80 Million Tiny Images that sorts the images according to subject matter, now allows the new algorithm to find something similar. The longer the LabelMe and 80 Million Tiny Images data bases are allowed to grow, the more readily the new object recognition algorithm will make correct and ever more accurate associations.

In fact, the existence of this large database, according to University of Central Florida computer vision researcher Marshall Tappen, will allow far more innovation than just image recognition. This is apparently born out by the fact, Tappen notes, "that several papers presented at Siggraph, the major conference in the field of computer graphics are all using LabelMe."

See:"Nonparametric Scene Parsing: Label Transfer via Dense Scene Alignment," Ce Liu, Jenny Yuen, and Antonio Torralba.

80 Million Tiny Images website

LabelMe: The open annotation tool