Recurrent architecture enhances object recognition in brain and AI

Your ability to recognize objects is remarkable. If you see a cup under unusual lighting or from unexpected directions, there’s a good chance that your brain will still compute that it is a cup. Such precise object recognition is one holy grail for AI developers, such as those improving self-driving car navigation. While modeling primate object recognition in the visual cortex has revolutionized artificial visual recognition systems, current deep learning systems are simplified, and fail to recognize some objects that are child’s play for primates such as humans. In findings published in Nature Neuroscience, McGovern Investigator James DiCarlo and colleagues have found evidence that feedback improves recognition of hard-to-recognize objects in the primate brain, and that adding feedback circuitry also improves the performance of artificial neural network systems used for vision applications.

Deep convolutional neural networks (DCNN) are currently the most successful models for accurately recognizing objects on a fast timescale (<100 ms) and have a general architecture inspired by the primate ventral visual stream, cortical regions that progressively build an accessible and refined representation of viewed objects. Most DCNNs are simple in comparison to the primate ventral stream however.

“For a long period of time, we were far from an model-based understanding. Thus our field got started on this quest by modeling visual recognition as a feedforward process,” explains senior author DiCarlo, who is also the head of MIT’s Department of Brain and Cognitive Sciences and Research Co-Leader in the Center for Brains, Minds, and Machines (CBMM). “However, we know there are recurrent anatomical connections in brain regions linked to object recognition.”

Think of feedforward DCNNs and the portion of the visual system that first attempts to capture objects as a subway line that runs forward through a series of stations. The extra, recurrent brain networks are instead like the streets above, interconnected and not unidirectional. Because it only takes about 200 ms for the brain to recognize an object quite accurately, it was unclear if these recurrent interconnections in the brain had any role at all in core object recognition. For example, perhaps those recurrent connections are only in place to keep the visual system in tune over long periods of time. For example, the return gutters of the streets help slowly clear it of water and trash, but are not strictly needed to quickly move people from one end of town to the other. DiCarlo, along with lead author and CBMM postdoc Kohitij Kar, set out to test whether a subtle role of recurrent operations in rapid visual object recognition was being overlooked.

Challenging recognition

The authors first needed to identify objects that are trivially decoded by the primate brain, but are challenging for artificial systems. Rather than trying to guess why deep learning was having problems recognizing an object (is it due to clutter in the image? a misleading shadow?), the authors took an unbiased approach that turned out to be critical.

Kar explained further that “we realized that AI-models actually don’t have problems with every image where an object is occluded or in clutter. Humans trying to guess why AI models were challenged turned out to be holding us back.”

Instead, the authors presented the deep learning system, as well as monkeys and humans, with images, homing in on “challenge images” where the primates could easily recognize the objects in those images, but a feed forward DCNN ran into problems. When they, and others, added appropriate recurrent processing to these DCNNs, object recognition in challenge images suddenly became a breeze.

Processing times

Kar used neural recording methods with very high spatial and temporal precision to whether these images were really so trivial for primates. Remarkably, they found that though challenge images had initially appeared to be child’s play to the human brain, they actually involve extra neural processing time (about additional 30 milliseconds), suggesting that recurrent loops operate in our brain too.

 “What the computer vision community has recently achieved by stacking more and more layers onto artificial neural networks, evolution has achieved through a brain architecture with recurrent connections.” — Kohitij Kar

Diane Beck, Professor of Psychology and Co-chair of the Intelligent Systems Theme at the Beckman Institute and not an author on the study, explained further. “Since entirely feed forward deep convolutional nets are now remarkably good at predicting primate brain activity, it raised questions about the role of feedback connections in the primate brain. This study shows that, yes, feedback connections are very likely playing a role in object recognition after all.”

What does this mean for a self-driving car? It shows that deep learning architectures involved in object recognition need recurrent components if they are to match the primate brain, and also indicates how to operationalize this procedure for the next generation of intelligent machines.

“Recurrent models offer predictions of neural activity and behavior over time,” says Kar. “We may now be able to model more involved tasks. Perhaps one day, the systems will not only recognize an object, such as a person, but also perform cognitive tasks that the human brain so easily manages, such as understanding the emotions of other people.”

This work was supported by the Office of Naval Research grant MURI-114407 (J.J.D.). Center for Brains, Minds, and Machines (CBMM) funded by NSF STC award CCF-1231216 (K.K.).

Elephant or chair? How the brain IDs objects

As visual information flows into the brain through the retina, the visual cortex transforms the sensory input into coherent perceptions. Neuroscientists have long hypothesized that a part of the visual cortex called the inferotemporal (IT) cortex is necessary for the key task of recognizing individual objects, but the evidence has been inconclusive.

In a new study, MIT neuroscientists have found clear evidence that the IT cortex is indeed required for object recognition; they also found that subsets of this region are responsible for distinguishing different objects.

In addition, the researchers have developed computational models that describe how these neurons transform visual input into a mental representation of an object. They hope such models will eventually help guide the development of brain-machine interfaces (BMIs) that could be used for applications such as generating images in the mind of a blind person.

“We don’t know if that will be possible yet, but this is a step on the pathway toward those kinds of applications that we’re thinking about,” says James DiCarlo, the head of MIT’s Department of Brain and Cognitive Sciences, a member of the McGovern Institute for Brain Research, and the senior author of the new study.

Rishi Rajalingham, a postdoc at the McGovern Institute, is the lead author of the paper, which appears in the March 13 issue of Neuron.

Distinguishing objects

In addition to its hypothesized role in object recognition, the IT cortex also contains “patches” of neurons that respond preferentially to faces. Beginning in the 1960s, neuroscientists discovered that damage to the IT cortex could produce impairments in recognizing non-face objects, but it has been difficult to determine precisely how important the IT cortex is for this task.

The MIT team set out to find more definitive evidence for the IT cortex’s role in object recognition, by selectively shutting off neural activity in very small areas of the cortex and then measuring how the disruption affected an object discrimination task. In animals that had been trained to distinguish between objects such as elephants, bears, and chairs, they used a drug called muscimol to temporarily turn off subregions about 2 millimeters in diameter. Each of these subregions represents about 5 percent of the entire IT cortex.

These experiments, which represent the first time that researchers have been able to silence such small regions of IT cortex while measuring behavior over many object discriminations, revealed that the IT cortex is not only necessary for distinguishing between objects, but it is also divided into areas that handle different elements of object recognition.

The researchers found that silencing each of these tiny patches produced distinctive impairments in the animals’ ability to distinguish between certain objects. For example, one subregion might be involved in distinguishing chairs from cars, but not chairs from dogs. Each region was involved in 25 to 30 percent of the tasks that the researchers tested, and regions that were closer to each other tended to have more overlap between their functions, while regions far away from each other had little overlap.

“We might have thought of it as a sea of neurons that are completely mixed together, except for these islands of “face patches.” But what we’re finding, which many other studies had pointed to, is that there is large-scale organization over the entire region,” Rajalingham says.

The features that each of these regions are responding to are difficult to classify, the researchers say. The regions are not specific to objects such as dogs, nor easy-to-describe visual features such as curved lines.

“It would be incorrect to say that because we observed a deficit in distinguishing cars when a certain neuron was inhibited, this is a ‘car neuron,’” Rajalingham says. “Instead, the cell is responding to a feature that we can’t explain that is useful for car discriminations. There has been work in this lab and others that suggests that the neurons are responding to complicated nonlinear features of the input image. You can’t say it’s a curve, or a straight line, or a face, but it’s a visual feature that is especially helpful in supporting that particular task.”

Bevil Conway, a principal investigator at the National Eye Institute, says the new study makes significant progress toward answering the critical question of how neural activity in the IT cortex produces behavior.

“The paper makes a major step in advancing our understanding of this connection, by showing that blocking activity in different small local regions of IT has a different selective deficit on visual discrimination. This work advances our knowledge not only of the causal link between neural activity and behavior but also of the functional organization of IT: How this bit of brain is laid out,” says Conway, who was not involved in the research.

Brain-machine interface

The experimental results were consistent with computational models that DiCarlo, Rajalingham, and others in their lab have created to try to explain how IT cortex neuron activity produces specific behaviors.

“That is interesting not only because it says the models are good, but because it implies that we could intervene with these neurons and turn them on and off,” DiCarlo says. “With better tools, we could have very large perceptual effects and do real BMI in this space.”

The researchers plan to continue refining their models, incorporating new experimental data from even smaller populations of neurons, in hopes of developing ways to generate visual perception in a person’s brain by activating a specific sequence of neuronal activity. Technology to deliver this kind of input to a person’s brain could lead to new strategies to help blind people see certain objects.

“This is a step in that direction,” DiCarlo says. “It’s still a dream, but that dream someday will be supported by the models that are built up by this kind of work.”

The research was funded by the National Eye Institute, the Office of Naval Research, and the Simons Foundation.

James DiCarlo

Rapid Recognition

DiCarlo’s research goal is to reverse engineer the brain mechanisms that underlie human visual intelligence. He and his collaborators have revealed how population image transformations carried out by a deep stack of interconnected neocortical brain areas — called the primate ventral visual stream — are effortlessly able to extract object identity from visual images. His team uses a combination of large-scale neurophysiology, brain imaging, direct neural perturbation methods, and machine learning methods to build and test neurally-mechanistic computational models of the ventral visual stream and its support of cognition and behavior. Such an engineering-based understanding is likely to lead to new artificial vision and artificial intelligence approaches, new brain-machine interfaces to restore or augment lost senses, and a new foundation to ameliorate disorders of the mind.

Recognizing the partially seen

When we open our eyes in the morning and take in that first scene of the day, we don’t give much thought to the fact that our brain is processing the objects within our field of view with great efficiency and that it is compensating for a lack of information about our surroundings — all in order to allow us to go about our daily functions. The glass of water you left on the nightstand when preparing for bed is now partially blocked from your line of sight by your alarm clock, yet you know that it is a glass.

This seemingly simple ability for humans to recognize partially occluded objects — defined in this situation as the effect of one object in a 3-D space blocking another object from view — has been a complicated problem for the computer vision community. Martin Schrimpf, a graduate student in the DiCarlo lab in the Department of Brain and Cognitive Sciences at MIT, explains that machines have become increasingly adept at recognizing whole items quickly and confidently, but when something covers part of that item from view, this task becomes increasingly difficult for the models to accurately recognize the article.

“For models from computer vision to function in everyday life, they need to be able to digest occluded objects just as well as whole ones — after all, when you look around, most objects are partially hidden behind another object,” says Schrimpf, co-author of a paper on the subject that was recently published in the Proceedings of the National Academy of Sciences (PNAS).

In the new study, he says, “we dug into the underlying computations in the brain and then used our findings to build computational models. By recapitulating visual processing in the human brain, we are thus hoping to also improve models in computer vision.”

How are we as humans able to repeatedly do this everyday task without putting much thought and energy into this action, identifying whole scenes quickly and accurately after injesting just pieces? Researchers in the study started with the human visual cortex as a model for how to improve the performance of machines in this setting, says Gabriel Kreiman, an affiliate of the MIT Center for Brains, Minds, and Machines. Kreinman is a professor of ophthalmology at Boston Children’s Hospital and Harvard Medical School and was lead principal investigator for the study.

In their paper, “Recurrent computations for visual pattern completion,” the team showed how they developed a computational model, inspired by physiological and anatomical constraints, that was able to capture the behavioral and neurophysiological observations during pattern completion. In the end, the model provided useful insights towards understanding how to make inferences from minimal information.

Work for this study was conducted at the Center for Brains, Minds and Machines within the McGovern Institute for Brain Research at MIT.

Testing the limits of artificial visual recognition systems

While it can sometimes seem hard to see the forest from the trees, pat yourself on the back: as a human you are actually pretty good at object recognition. A major goal for artificial visual recognition systems is to be able to distinguish objects in the way that humans do. If you see a tree or a bush from almost any angle, in any degree of shading (or even rendered in pastels and pixels in a Monet), you would recognize it as a tree or a bush. However, such recognition has traditionally been a challenge for artificial visual recognition systems. Researchers at MIT’s McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences (BCS) have now directly examined and shown that artificial object recognition is quickly becoming more primate-like, but still lags behind when scrutinized at higher resolution.

In recent years, dramatic advances in “deep learning” have produced artificial neural network models that appear remarkably similar to aspects of primate brains. James DiCarlo, Peter de Florez Professor and Department Head of BCS, set out to determine and carefully quantify how well the current leading artificial visual recognition systems match humans and other higher primates when it comes to image categorization. In recent years, dramatic advances in “deep learning” have produced artificial neural network models that appear remarkably similar to aspects of primate brains, so DiCarlo and his team put these latest models through their paces.

Rishi Rajalingham, a graduate student in DiCarlo’s lab conducted the study as part of his thesis work at the McGovern Institute. As Rajalingham puts it “one might imagine that artificial vision systems should behave like humans in order to seamlessly be integrated into human society, so this tests to what extent that is true.”

The team focused on testing so-called “deep, convolutional neural networks” (DCNNs), and specifically those that had trained on ImageNet, a collection of large-scale category-labeled image sets that have recently been used as a library to train neural networks (called DCNNIC models). These specific models have thus essentially been trained in an intense image recognition bootcamp. The models were then pitted against monkeys and humans and asked to differentiate objects in synthetically constructed images. These synthetic images put the object being categorized in unusual backgrounds and orientations. The resulting images (such as the floating camel shown above) evened the playing field for the machine models (humans would ordinarily have a leg up on image categorization based on assessing context, so this was specifically removed as a confounder to allow a pure comparison of specific object categorization).

DiCarlo and his team found that humans, monkeys and DCNNIC models all appeared to perform similarly, when examined at a relatively coarse level. Essentially, each group was shown 100 images of 24 different objects. When you averaged how they did across 100 photos of a given object, they could distinguish, for example, camels pretty well overall. The researchers then zoomed in and examined the behavioral data at a much finer resolution (i.e. for each single photo of a camel), thus deriving more detailed “behavioral fingerprints” of primates and machines. These detailed analyses of how they did for each individual image revealed strong differences: monkeys still behaved very consistently like their human primate cousins, but the artificial neural networks could no longer keep up.

“I thought it was quite surprising that monkeys and humans are remarkably similar in their recognition behaviors, especially given that these objects (e.g. trucks, tanks, camels, etc.) don’t “mean” anything to monkeys” says Rajalingham. “It’s indicative of how closely related these two species are, at least in terms of these visual abilities.”

DiCarlo’s team gave the neural networks remedial homework to see if they could catch up upon extra-curricular training by now training the models on images that more closely resembled the synthetic images used in their study. Even with this extra training (which the humans and monkeys did not receive), they could not match a primate’s ability to discern what was in each individual image.

DiCarlo conveys that this is a glass half-empty and half-full story. Says DiCarlo, “The half full part is that, today’s deep artificial neural networks that have been developed based on just some aspects of brain function are far better and far more human-like in their object recognition behavior than artificial systems just a few years ago,” explains DiCarlo. “However, careful and systematic behavioral testing reveals that even for visual object recognition, the brain’s neural network still has some tricks up its sleeve that these artificial neural networks do not yet have.”

Dicarlo’s study begins to define more precisely when it is that the leading artificial neural networks start to “trip up”, and highlights a fundamental aspect of their architecture that struggles with categorization of single images. This flaw seems to be unaddressable through further brute force training. The work also provides an unprecedented and rich dataset of human (1476 anonymous humans to be exact) and primate behavior that will help act as a quantitative benchmark for improvement of artificial neural networks.


Image: Example of synthetic image used in the study. For category ‘camel’, 100 distinct, synthetic camel images were shown to DCNNIC models, humans and rhesus monkeys. 24 different categories were tested altogether.

Engineering intelligence

Go is an ancient board game that demands not only strategy and logic, but intuition, creativity, and subtlety—in other words, it’s a game of quintessentially human abilities. Or so it seemed, until Google’s DeepMind AI program, AlphaGo, roundly defeated the world’s top Go champion.

But ask it to read social cues or interpret what another person is thinking and it wouldn’t know where to start. It wouldn’t even understand that it didn’t know where to start. Outside of its game-playing milieu, AlphaGo is as smart as a rock.

“The problem of intelligence is the greatest problem in science,” says Tomaso Poggio, Eugene McDermott Professor of Brain and Cognitive Sciences at the McGovern Institute. One reason why? We still don’t really understand intelligence in ourselves.

Right now, most advanced AI developments are led by industry giants like Facebook, Google, Tesla and Apple, with an emphasis on engineering and computation, and very little work in humans. That has yielded enormous breakthroughs including Siri and Alexa, ever-better autonomous cars and AlphaGo.

But as Poggio points out, the algorithms behind most of these incredible technologies come right out of past neuroscience research–deep learning networks and reinforcement learning. “So it’s a good bet,” Poggio says, “that one of the next breakthroughs will also come from neuroscience.”

Five years ago, Poggio and a host of researchers at MIT and beyond took that bet when they applied for and won a $25 million Science and Technology Center award from the National Science Foundation to form the Center for Brains, Minds and Machines. The goal of the center was to take those computational approaches and blend them with basic, curiosity-driven research in neuroscience and cognition. They would knock down the divisions that traditionally separated these fields and not only unlock the secrets of human intelligence and develop smarter AIs, but found an entire new field—the science and engineering of intelligence.

A collaborative foundation

CBMM is a sprawling research initiative headquartered at the McGovern Institute, encompassing faculty at Harvard, Johns Hopkins, Rockefeller and Stanford; over a dozen industry collaborators including Siemens, Google, Toyota, Microsoft, Schlumberger and IBM; and partner institutions such as Howard University, Wellesley College and the University of Puerto Rico. The effort has already churned out 397 publications and has just been renewed for five more years and another $25 million.

For the first few years, collaboration in such a complex center posed a challenge. Research efforts were still divided into traditional silos—one research thrust for cognitive science, another for computation, and so on. But as the center grew, colleagues found themselves talking more and a new common language emerged. Immersed in each other’s research, the divisions began to fade.

“It became more than just a center in name,” says Matthew Wilson, associate director of CBMM and the Sherman Fairchild Professor of Neuroscience at MIT’s Department of Brain and Cognitive Sciences (BCS). “It really was trying to drive a new way of thinking about research and motivating intellectual curiosity that was motivated by this shared vision that all the participants had.”

New questioning

Today, the center is structured around four interconnected modules grounded around the problem of visual intelligence—vision, because it is the most understood and easily traced of our senses. The first module, co-directed by Poggio himself, unravels the visual operations that begin within that first few milliseconds of visual recognition as the information travels through the eye and to the visual cortex. Gabriel Kreiman, who studies visual comprehension at Harvard Medical School and Children’s Hospital, leads the second module which takes on the subsequent events as the brain directs the eye where to go next, what it is seeing and what to pay attention to, and then integrates this information into a holistic picture of the world that we experience. His research questions have grown as a result of CBMM’s cross-disciplinary influence.

Leyla Isik, a postdoc in Kreiman’s lab, is now tackling one of his new research initiatives: social intelligence. “So much of what we do and see as humans are social interactions between people. But even the best machines have trouble with it,” she explains.

To reveal the underlying computations of social intelligence, Isik is using data gathered from epilepsy patients as they watch full-length movies. (Certain epileptics spend several weeks before surgery with monitoring electrodes in their brains, providing a rare opportunity for scientists to see inside the brain of a living, thinking human). Isik hopes to be able to pick out reliable patterns in their neural activity that indicate when the patient is processing certain social cues such as faces. “It’s a pretty big challenge, so to start out we’ve tried to simplify the problem a little bit and just look at basic social visual phenomenon,” she explains.

In true CBMM spirit, Isik is co-advised by another McGovern investigator, Nancy Kanwisher, who helps lead CBMM’s third module with BCS Professor of Computational Cognitive Science, Josh Tenenbaum. That module picks up where the second leaves off, asking still deeper questions about how the brain understands complex scenes, and how infants and children develop the ability to piece together the physics and psychology of new events. In Kanwisher’s lab, instead of a stimulus-heavy movie, Isik shows simple stick figures to subjects in an MRI scanner. She’s looking for specific regions of the brain that engage only when the subjects view the “social interactions” between the figures. “I like the approach of tackling this problem both from very controlled experiments as well as something that’s much more naturalistic in terms of what people and machines would see,” Isik explains.

Built-in teamwork

Such complementary approaches are the norm at CBMM. Postdocs and graduate students are required to have at least two advisors in two different labs. The NSF money is even assigned directly to postdoc and graduate student projects. This ensures that collaborations are baked into the center, Wilson explains. “If the idea is to create a new field in the science of intelligence, you can’t continue to support work the way it was done in the old fields—you have to create a new model.”

In other labs, students and postdocs blend imaging with cognitive science to understand how the brain represents physics—like the mass of an object it sees. Or they’re combining human, primate, mouse and computational experiments to better understand how the living brain represents new objects it encounters, and then building algorithms to test the resulting theories.

Boris Katz’s lab is in the fourth and final module, which focuses on figuring out how the brain’s visual intelligence ties into higher-level thinking, like goal planning, language, and abstract concepts. One project, led by MIT research scientist Andrei Barbu and Yen-Ling Kuo, in collaboration with Harvard cognitive scientist Liz Spelke, is attempting to uncover how humans and machines devise plans to navigate around complex and dangerous environments.

“CBMM gives us the opportunity to close the loop between machine learning, cognitive science, and neuroscience,” says Barbu. “The cognitive science informs better machine learning, which helps us understand how humans behave and that in turn points the way toward understanding the structure of the brain. All of this feeds back into creating more capable machines.”

A new field

Every summer, CBMM heads down to Woods Hole, Massachusetts, to deliver an intensive crash course on the science of intelligence to graduate students from across the country. It’s one of many education initiatives designed to spread CBMM’s approach and key to the goal of establishing a new field. The students who come to learn from these courses often find it as transformative as the CBMM faculty did when the center began.

Candace Ross was an undergraduate at Howard University when she got her first taste of CBMM at a summer course with Kreiman trying to model human memory in machine learning algorithms. “It was the best summer of my life,” she says. “There were so many concepts I didn’t know about and didn’t understand. We’d get back to the dorm at night and just sit around talking about science.”

Ross loved it so much that she spent a second summer at CBMM, and is now a third-year graduate student working with Katz and Barbu, teaching computers how to use vision and language to learn more like children. She’s since gone back to the summer programs, now as a teaching assistant. “CBMM is a research center,” says Ellen Hildreth, a computer scientist at Wellesley College who coordinates CBMM’s education programs. “But it also fosters a strong commitment to education, and that effort is helping to create a community of researchers around this new field.”

Quest for intelligence

CBMM has far to go in its mission to understand the mind, but there is good reason to believe that what CBMM started will continue well beyond the NSF-funded ten years.

This February, MIT announced a new institute-wide initiative called the MIT Intelligence Quest, or MIT IQ. It’s a massive interdisciplinary push to study human intelligence and create new tools based on that knowledge. It is also, says McGovern Institute Director Robert Desimone, a sign of the institute’s faith in what CBMM itself has so far accomplished. “The fact that MIT has made this big commitment in this area is an endorsement of the kind of view we’ve been promoting through CBMM,” he says.

MIT IQ consists of two linked entities: “The Core” and “The Bridge.” CBMM is part of the Core, which will advance the science and engineering of both human and machine intelligence. “This combination is unique to MIT,” explains Poggio, “and is designed to win not only Turing but also Nobel prizes.”

And more than that, points out BCS Department Head Jim DiCarlo, it’s also a return to CBMM’s very first mission. Before CBMM began, Poggio and a few other MIT scientists had tested the waters with a small, Institute-funded collaboration called the Intelligence Initiative (I^2), that welcomed all types of intelligence research–even business and organizational intelligence. MIT IQ re-opens that broader door. “In practice, we want to build a bigger tent now around the science of intelligence,” DiCarlo says.

For his part, Poggio finds the name particularly apt. “Because it is going to be a long-term quest,” he says. “Remember, if I’m right, this is the greatest problem in science. Understanding the mind is understanding the very tool we use to try to solve every other problem.”

Institute launches the MIT Intelligence Quest

MIT today announced the launch of the MIT Intelligence Quest, an initiative to discover the foundations of human intelligence and drive the development of technological tools that can positively influence virtually every aspect of society.

The announcement was first made in a letter MIT President L. Rafael Reif sent to the Institute community.

At a time of rapid advances in intelligence research across many disciplines, the Intelligence Quest will encourage researchers to investigate the societal implications of their work as they pursue hard problems lying beyond the current horizon of what is known.

Some of these advances may be foundational in nature, involving new insight into human intelligence, and new methods to allow machines to learn effectively. Others may be practical tools for use in a wide array of research endeavors, such as disease diagnosis, drug discovery, materials and manufacturing design, automated systems, synthetic biology, and finance.

“Today we set out to answer two big questions, says President Reif. “How does human intelligence work, in engineering terms? And how can we use that deep grasp of human intelligence to build wiser and more useful machines, to the benefit of society?”

MIT Intelligence Quest: The Core and The Bridge

MIT is poised to lead this work through two linked entities within MIT Intelligence Quest. One of them, “The Core,” will advance the science and engineering of both human and machine intelligence. A key output of this work will be machine-learning algorithms. At the same time, MIT Intelligence Quest seeks to advance our understanding of human intelligence by using insights from computer science.

The second entity, “The Bridge” will be dedicated to the application of MIT discoveries in natural and artificial intelligence to all disciplines, and it will host state-of-the-art tools from industry and research labs worldwide.

The Bridge will provide a variety of assets to the MIT community, including intelligence technologies, platforms, and infrastructure; education for students, faculty, and staff about AI tools; rich and unique data sets; technical support; and specialized hardware.

Along with developing and advancing the technologies of intelligence, MIT Intelligence Quest researchers will also investigate the societal and ethical implications of advanced analytical and predictive tools. There are already active projects and groups at the Institute investigating autonomous systems, media and information quality, labor markets and the work of the future, innovation and the digital economy, and the role of AI in the legal system.

In all its activities, MIT Intelligence Quest is intended to take advantage of — and strengthen — the Institute’s culture of collaboration. MIT Intelligence Quest will connect and amplify existing excellence across labs and centers already engaged in intelligence research. It will also establish shared, central spaces conducive to group work, and its resources will directly support research.

“Our quest is meant to power world-changing possibilities,” says Anantha Chandrakasan, dean of the MIT School of Engineering and Vannevar Bush Professor of Electrical Engineering and Computer Science. Chandrakasan, in collaboration with Provost Martin Schmidt and all four of MIT’s other school deans, has led the development and establishment of MIT Intelligence Quest.

“We imagine preventing deaths from cancer by using deep learning for early detection and personalized treatment,” Chandrakasan continues. “We imagine artificial intelligence in sync with, complementing, and assisting our own intelligence. And we imagine every scientist and engineer having access to human-intelligence-inspired algorithms that open new avenues of discovery in their fields. Researchers across our campus want to push the boundaries of what’s possible.”

Engaging energetically with partners

In order to power MIT Intelligence Quest and achieve results that are consistent with its ambitions, the Institute will raise financial support through corporate sponsorship and philanthropic giving.

MIT Intelligence Quest will build on the model that was established with the MIT–IBM Watson AI Lab, which was announced in September 2017. MIT researchers will collaborate with each other and with industry on challenges that range in scale from the very broad to the very specific.

“In the short time since we began our collaboration with IBM, the lab has garnered tremendous interest inside and outside MIT, and it will be a vital part of MIT Intelligence Quest,” says President Reif.

John E. Kelly III, IBM senior vice president for cognitive solutions and research, says, “To take on the world’s greatest challenges and seize its biggest opportunities, we need to rapidly advance both AI technology and our understanding of human intelligence. Building on decades of collaboration — including our extensive joint MIT–IBM Watson AI Lab — IBM and MIT will together shape a new agenda for intelligence research and its applications. We are proud to be a cornerstone of this expanded initiative.”

MIT will seek to establish additional entities within MIT Intelligence Quest, in partnership with corporate and philanthropic organizations.


MIT has been on the frontier of intelligence research since the 1950s, when pioneers Marvin Minsky and John McCarthy helped establish the field of artificial intelligence.

MIT now has over 200 principal investigators whose research bears directly on intelligence. Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and the MIT Department of Brain and Cognitive Sciences (BCS) — along with the McGovern Institute for Brain Research and the Picower Institute for Learning and Memory — collaborate on a range of projects. MIT is also home to the National Science Foundation–funded center for Brains, Minds and Machines (CBMM) — the only national center of its kind.

Four years ago, MIT launched the Institute for Data, Systems, and Society (IDSS) with a mission promoting data science, particularly in the context of social systems. It is  anticipated that faculty and students from IDSS will play a critical role in this initiative.

Faculty from across the Institute will participate in the initiative, including researchers in the Media Lab, the Operations Research Center, the Sloan School of Management, the School of Architecture and Planning, and the School of Humanities, Arts, and Social Sciences.

“Our quest will amount to a journey taken together by all five schools at MIT,” says Provost Schmidt. “Success will rest on a shared sense of purpose and a mix of contributions from a wide variety of disciplines. I’m excited by the new thinking we can help unlock.”

At the heart of MIT Intelligence Quest will be collaboration among researchers in human and artificial intelligence.

“To revolutionize the field of artificial intelligence, we should continue to look to the roots of intelligence: the brain,” says James DiCarlo, department head and Peter de Florez Professor of Neuroscience in the Department of Brain and Cognitive Sciences. “By working with engineers and artificial intelligence researchers, human intelligence researchers can build models of the brain systems that produce intelligent behavior. The time is now, as model building at the scale of those brain systems is now possible. Discovering how the brain works in the language of engineers will not only lead to transformative AI — it will also illuminate entirely new ways to repair, educate, and augment our own minds.”

Daniela Rus, the Andrew (1956) and Erna Viterbi Professor of Electrical Engineering and Computer Science at MIT, and director of CSAIL, agrees. MIT researchers, she says, “have contributed pioneering and visionary solutions for intelligence since the beginning of the field, and are excited to make big leaps to understand human intelligence and to engineer significantly more capable intelligent machines. Understanding intelligence will give us the knowledge to understand ourselves and to create machines that will support us with cognitive and physical work.”

David Siegel, who earned a PhD in computer science at MIT in 1991 pursuing research at MIT’s Artificial Intelligence Laboratory, and who is a member of the MIT Corporation and an advisor to the MIT Center for Brains, Minds, and Machines, has been integral to the vision and formation of MIT Intelligence Quest and will continue to help shape the effort. “Understanding human intelligence is one of the greatest scientific challenges,” he says, “one that helps us understand who we are while meaningfully advancing the field of artificial intelligence.” Siegel is co-chairman and a founder of Two Sigma Investments, LP.

The fruits of research

MIT Intelligence Quest will thus provide a platform for long-term research, encouraging the foundational advances of the future. At the same time, MIT professors and researchers may develop technologies with near-term value, leading to new kinds of collaborations with existing companies — and to new companies.

Some such entrepreneurial efforts could be supported by The Engine, an Institute initiative launched in October 2016 to support startup companies pursuing particularly ambitious goals.

Other innovations stemming from MIT Intelligence Quest could be absorbed into the innovation ecosystem surrounding the Institute — in Kendall Square, Cambridge, and the Boston metropolitan area. MIT is located in close proximity to a world-leading nexus of biotechnology and medical-device research and development, as well as a cluster of leading-edge technology firms that study and deploy machine intelligence.

MIT also has roots in centers of innovation elsewhere in the United States and around the world, through faculty research projects, institutional and industry collaborations, and the activities and leadership of its alumni. MIT Intelligence Quest will seek to connect to innovative companies and individuals who share MIT’s passion for work in intelligence.

Eric Schmidt, former executive chairman of Alphabet, has helped MIT form the vision for MIT Intelligence Quest. “Imagine the good that can be done by putting novel machine-learning tools in the hands of those who can make great use of them,” he says. “MIT Intelligence Quest can become a fount of exciting new capabilities.”

“I am thrilled by today’s news,” says President Reif. “Drawing on MIT’s deep strengths and signature values, culture, and history, MIT Intelligence Quest promises to make important contributions to understanding the nature of intelligence, and to harnessing it to make a better world.”

“MIT is placing a bet,” he says, “on the central importance of intelligence research to meeting the needs of humanity.”

How the brain recognizes objects

When the eyes are open, visual information flows from the retina through the optic nerve and into the brain, which assembles this raw information into objects and scenes.

Scientists have previously hypothesized that objects are distinguished in the inferior temporal (IT) cortex, which is near the end of this flow of information, also called the ventral stream. A new study from MIT neuroscientists offers evidence that this is indeed the case.

Using data from both humans and nonhuman primates, the researchers found that neuron firing patterns in the IT cortex correlate strongly with success in object-recognition tasks.

“While we knew from prior work that neuronal population activity in inferior temporal cortex was likely to underlie visual object recognition, we did not have a predictive map that could accurately link that neural activity to object perception and behavior. The results from this study demonstrate that a particular map from particular aspects of IT population activity to behavior is highly accurate over all types of objects that were tested,” says James DiCarlo, head of MIT’s Department of Brain and Cognitive Sciences, a member of the McGovern Institute for Brain Research, and senior author of the study, which appears in the Journal of Neuroscience.

The paper’s lead author is Najib Majaj, a former postdoc in DiCarlo’s lab who is now at New York University. Other authors are former MIT graduate student Ha Hong and former MIT undergraduate Ethan Solomon.

Distinguishing objects

Earlier stops along the ventral stream are believed to process basic visual elements such as brightness and orientation. More complex functions take place farther along the stream, with object recognition believed to occur in the IT cortex.

To investigate this theory, the researchers first asked human subjects to perform 64 object-recognition tasks. Some of these tasks were “trivially easy,” Majaj says, such as distinguishing an apple from a car. Others — such as discriminating between two very similar faces — were so difficult that the subjects were correct only about 50 percent of the time.

After measuring human performance on these tasks, the researchers then showed the same set of nearly 6,000 images to nonhuman primates as they recorded electrical activity in neurons of the inferior temporal cortex and another visual region known as V4.

Each of the 168 IT neurons and 128 V4 neurons fired in response to some objects but not others, creating a firing pattern that served as a distinctive signature for each object. By comparing these signatures, the researchers could analyze whether they correlated to humans’ ability to distinguish between two objects.

The researchers found that the firing patterns of IT neurons, but not V4 neurons, perfectly predicted the human performances they had seen. That is, when humans had trouble distinguishing two objects, the neural signatures for those objects were so similar as to be indistinguishable, and for pairs where humans succeeded, the patterns were very different.

“On the easy stimuli, IT did as well as humans, and on the difficult stimuli, IT also failed,” Majaj says. “We had a nice correlation between behavior and neural responses.”

The findings support the hypothesis that patterns of neural activity in the IT cortex can encode object representations detailed enough to allow the brain to distinguish different objects, the researchers say.

Nikolaus Kriegeskorte, a principal investigator at the Medical Research Council Cognition and Brain Sciences Unit in Cambridge, U.K., agrees that the study offers “crucial evidence supporting the idea that inferior temporal cortex contains the neuronal representations underlying human visual object recognition.”

“This study is exemplary for its original and rigorous method of establishing links between brain representations and human behavioral performance,” adds Kriegeskorte, who was not part of the research team.

Model performance

The researchers also tested more than 10,000 other possible models for how the brain might encode object representations. These models varied based on location in the brain, the number of neurons required, and the time window for neural activity.

Some of these models, including some that relied on V4, were eliminated because they performed better than humans on some tasks and worse on others.

“We wanted the performance of the neurons to perfectly match the performance of the humans in terms of the pattern, so the easy tasks would be easy for the neural population and the hard tasks would be hard for the neural population,” Majaj says.

The research team now aims to gather even more data to ask if this model or similar models can predict the behavioral difficulty of object recognition on each and every visual image — an even higher bar than the one tested thus far. That might require additional factors to be included in the model that were not needed in this study, and thus could expose important gaps in scientists’ current understanding of neural representations of objects.

They also plan to expand the model so they can predict responses in IT based on input from earlier parts of the visual stream.

“We can start building a cascade of computational operations that take you from an image on the retina slowly through V1, V2, V4, until we’re able to predict the population in IT,” Majaj says.

In one aspect of vision, computers catch up to primate brain

For decades, neuroscientists have been trying to design computer networks that can mimic visual skills such as recognizing objects, which the human brain does very accurately and quickly.

Until now, no computer model has been able to match the primate brain at visual object recognition during a brief glance. However, a new study from MIT neuroscientists has found that one of the latest generation of these so-called “deep neural networks” matches the primate brain.

Because these networks are based on neuroscientists’ current understanding of how the brain performs object recognition, the success of the latest networks suggest that neuroscientists have a fairly accurate grasp of how object recognition works, says James DiCarlo, a professor of neuroscience and head of MIT’s Department of Brain and Cognitive Sciences and the senior author of a paper describing the study in the Dec. 11 issue of the journal PLoS Computational Biology.

“The fact that the models predict the neural responses and the distances of objects in neural population space shows that these models encapsulate our current best understanding as to what is going on in this previously mysterious portion of the brain,” says DiCarlo, who is also a member of MIT’s McGovern Institute for Brain Research.

This improved understanding of how the primate brain works could lead to better artificial intelligence and, someday, new ways to repair visual dysfunction, adds Charles Cadieu, a postdoc at the McGovern Institute and the paper’s lead author.

Other authors are graduate students Ha Hong and Diego Ardila, research scientist Daniel Yamins, former MIT graduate student Nicolas Pinto, former MIT undergraduate Ethan Solomon, and research affiliate Najib Majaj.

Inspired by the brain

Scientists began building neural networks in the 1970s in hopes of mimicking the brain’s ability to process visual information, recognize speech, and understand language.

For vision-based neural networks, scientists were inspired by the hierarchical representation of visual information in the brain. As visual input flows from the retina into primary visual cortex and then inferotemporal (IT) cortex, it is processed at each level and becomes more specific until objects can be identified.

To mimic this, neural network designers create several layers of computation in their models. Each level performs a mathematical operation, such as a linear dot product. At each level, the representations of the visual object become more and more complex, and unneeded information, such as an object’s location or movement, is cast aside.

“Each individual element is typically a very simple mathematical expression,” Cadieu says. “But when you combine thousands and millions of these things together, you get very complicated transformations from the raw signals into representations that are very good for object recognition.”

For this study, the researchers first measured the brain’s object recognition ability. Led by Hong and Majaj, they implanted arrays of electrodes in the IT cortex as well as in area V4, a part of the visual system that feeds into the IT cortex. This allowed them to see the neural representation — the population of neurons that respond — for every object that the animals looked at.

The researchers could then compare this with representations created by the deep neural networks, which consist of a matrix of numbers produced by each computational element in the system. Each image produces a different array of numbers. The accuracy of the model is determined by whether it groups similar objects into similar clusters within the representation.

“Through each of these computational transformations, through each of these layers of networks, certain objects or images get closer together, while others get further apart,” Cadieu says.

The best network was one that was developed by researchers at New York University, which classified objects as well as the macaque brain.

More processing power

Two major factors account for the recent success of this type of neural network, Cadieu says. One is a significant leap in the availability of computational processing power. Researchers have been taking advantage of graphical processing units (GPUs), which are small chips designed for high performance in processing the huge amount of visual content needed for video games. “That is allowing people to push the envelope in terms of computation by buying these relatively inexpensive graphics cards,” Cadieu says.

The second factor is that researchers now have access to large datasets to feed the algorithms to “train” them. These datasets contain millions of images, and each one is annotated by humans with different levels of identification. For example, a photo of a dog would be labeled as animal, canine, domesticated dog, and the breed of dog.

At first, neural networks are not good at identifying these images, but as they see more and more images, and find out when they were wrong, they refine their calculations until they become much more accurate at identifying objects.

Cadieu says that researchers don’t know much about what exactly allows these networks to distinguish different objects.

“That’s a pro and a con,” he says. “It’s very good in that we don’t have to really know what the things are that distinguish those objects. But the big con is that it’s very hard to inspect those networks, to look inside and see what they really did. Now that people can see that these things are working well, they’ll work more to understand what’s happening inside of them.”

DiCarlo’s lab now plans to try to generate models that can mimic other aspects of visual processing, including tracking motion and recognizing three-dimensional forms. They also hope to create models that include the feedback projections seen in the human visual system. Current networks only model the “feedforward” projections from the retina to the IT cortex, but there are 10 times as many connections that go from IT cortex back to the rest of the system.

This work was supported by the National Eye Institute, the National Science Foundation, and the Defense Advanced Research Projects Agency.