Scientists are increasingly developing programs and technology that do things the way people do: robots that walk like humans, computers that think like humans and even algorithms that write stories like humans. But what about machines that see like humans?
That’s a project Laurent Itti of the USC Viterbi School of Engineering is currently working on. The associate professor of computer science is using a National Science Foundation (NSF) Expeditions Grant to create cognitive vision systems in collaboration with The Pennsylvania State University, University of California, Los Angeles, University of California, San Diego, Massachusetts Institute of Technology, York College of Pennsylvania, University of Pittsburgh and Stanford University.
The Expeditions Grant — one of only two given this year — will provide $10 million in funding over five years to the multi-investigator research team, representing the largest single investment in computer science research made by the NSF.
“We’re developing new algorithms for visual processing that are inspired by the way the human brain works,” Itti said.
And how exactly does the brain process visual data? Itti’s group is focusing on two key areas of visual processing that can be applied to machines: top-down attention and compositionality.
Top-down attention refers to the decision tree you automatically engage in when you’re looking for something. For example, if you want to find a stapler in an office, you start scanning for surfaces where you would normally find a stapler: on a desk, on a table, in a drawer. You most likely would not look at the ceiling or search behind a bookcase. Your brain combines the goal, previous knowledge and biased visual scanning to find the stapler. Itti wants computers to engage in the same reasoning and searching process that humans do.
Compositionality refers to our hierarchical way of recognizing objects. You know what a wheel is, for instance, and you can recognize it whether it’s on a bike, a car or a skateboard. But this is a difficult task for a computer. Right now, recognition software is mostly task-specific. That is, effective face detectors exist and can isolate faces in a photo, but facial recognition software won’t make any meaning from a picture of a fire engine.
Itti’s contribution to the project will be outlining a dictionary of components and writing algorithms that define ways they can combine to form different objects. That way, if you need to add a new type of object to the recognition database, you can do that by drawing on this fundamental set of components.
Other facets of the project — which don’t involve Itti’s group — include hardware design, the human-machine interface, usability and privacy issues.
“This project will result in smart camera systems that approach the cognitive abilities of the human cortex,” explained principal investigator Vijay Narayanan from Penn State. “Such cameras can understand visual content and result in multifaceted impact on society, including visual aids for visually impaired persons, driver assistance for reducing automotive accidents and augmented reality for enhanced shopping, travel and safety.”
For the blind, a camera positioned on a pair of glasses could help the user find a desired item by giving him or her information about its location relative to the space. At a grocery store, for example, if the user is looking for a box of Cheerios, the voice-activated system could scan the shelves and communicate the location of the box to the user.
How would the system tell the user where the box is? A few variations of tactile communications could be used. Vibrations from a headset on either side of the user’s head or vibrations on a special vest could direct the user in the direction of the target, much like we tend to say, “warmer, warmer; colder, colder.”
For assisting drivers, cars outfitted with cameras could process information about its environment and possibly catch a hazard the driver didn’t notice, such as a small child running into the road. Visual input might also include the status of the driver’s attention: If the driver is looking left at a street sign, the car could pay special attention to possible hazards on the right.
Lastly, this research may potentially add to the emerging field of augmented reality, using products like Google Glass. Augmented reality is similar to the application for the blind but geared toward people who can see, in that the computer visual systems give users real-time information about their surroundings. In the same supermarket cereal aisle, a user could compare prices of an item at nearby stores and get health-related content about the product, among other things.