When computer vision works more like a brain, it sees more like people do

From cameras to self-driving cars, many of today’s technologies depend on artificial intelligence (AI) to extract meaning from visual information.  Today’s AI technology has artificial neural networks at its core, and most of the time we can trust these AI computer vision systems to see things the way we do — but sometimes they falter. According to MIT and IBM Research scientists, one way to improve computer vision is to instruct the artificial neural networks that they rely on to deliberately mimic the way the brain’s biological neural network processes visual images.

Researchers led by James DiCarlo, the director of MIT’s Quest for Intelligence and member of the MIT-IBM Watson AI Lab, have made a computer vision model more robust by training it to work like a part of the brain that humans and other primates rely on for object recognition. This May, at the International Conference on Learning Representations (ICLR), the team reported that when they trained an artificial neural network using neural activity patterns in the brain’s inferior temporal (IT) cortex, the artificial neural network was more robustly able to identify objects in images than a model that lacked that neural training. And the model’s interpretations of images more closely matched what humans saw, even when images included minor distortions that made the task more difficult.

Comparing neural circuits

Portrait of Professor DiCarlo
McGovern Investigator and Director of MIT Quest for Intelligence, James DiCarlo. Photo: Justin Knight

Many of the artificial neural networks used for computer vision already resemble the multi-layered brain circuits that process visual information in humans and other primates. Like the brain, they use neuron-like units that work together to process information. As they are trained for a particular task, these layered components collectively and progressively process the visual information to complete the task — determining for example, that an image depicts a bear or a car or a tree.

DiCarlo and others previously found that when such deep-learning computer vision systems establish efficient ways to solve visual problems, they end up with artificial circuits that work similarly to the neural circuits that process visual information in our own brains. That is, they turn out to be surprisingly good scientific models of the neural mechanisms underlying primate and human vision.

That resemblance is helping neuroscientists deepen their understanding of the brain. By demonstrating ways visual information can be processed to make sense of images, computational models suggest hypotheses about how the brain might accomplish the same task. As developers continue to refine computer vision models, neuroscientists have found new ideas to explore in their own work.

“As vision systems get better at performing in the real world, some of them turn out to be more human-like in their internal processing. That’s useful from an understanding biology point of view,” says DiCarlo, who is also a professor of brain and cognitive sciences and an investigator at the McGovern Institute.

Engineering more brain-like AI

While their potential is promising, computer vision systems are not yet perfect models of human vision. DiCarlo suspected one way to improve computer vision may be to incorporate specific brain-like features into these models.

To test this idea, he and his collaborators built a computer vision model using neural data previously collected from vision-processing neurons in the monkey IT cortex — a key part of the primate ventral visual pathway involved in the recognition of objects — while the animals viewed various images. More specifically, Joel Dapello, a Harvard graduate student and former MIT-IBM Watson AI Lab intern, and Kohitij Kar, Assistant Professor, Canada Research Chair (Visual Neuroscience) at York University and visiting scientist at MIT, in collaboration with David Cox, IBM Research’s VP for AI Models and IBM director of the MIT-IBM Watson AI Lab, and other researchers at IBM Research and MIT, asked an artificial neural network to emulate the behavior of these primate vision-processing neurons while the network learned to identify objects in a standard computer vision task.

“In effect, we said to the network, ‘please solve this standard computer vision task, but please also make the function of one of your inside simulated “neural” layers be as similar as possible to the function of the corresponding biological neural layer,’” DiCarlo explains. “We asked it to do both of those things as best it could.” This forced the artificial neural circuits to find a different way to process visual information than the standard, computer vision approach, he says.

After training the artificial model with biological data, DiCarlo’s team compared its activity to a similarly-sized neural network model trained without neural data, using the standard approach for computer vision. They found that the new, biologically-informed model IT layer was – as instructed — a better match for IT neural data.  That is, for every image tested, the population of artificial IT neurons in the model responded more similarly to the corresponding population of biological IT neurons.

“Everybody gets something out of the exciting virtuous cycle between natural/biological intelligence and artificial intelligence,” DiCarlo says.

The researchers also found that the model IT was also a better match to IT neural data collected from another monkey, even though the model had never seen data from that animal, and even when that comparison was evaluated on that monkey’s IT responses to new images. This indicated that the team’s new, “neurally-aligned” computer model may be an improved model of the neurobiological function of the primate IT cortex — an interesting finding, given that it was previously unknown whether the amount of neural data that can be currently collected from the primate visual system is capable of directly guiding model development.

With their new computer model in hand, the team asked whether the “IT neural alignment” procedure also leads to any changes in the overall behavioral performance of the model. Indeed, they found that the neurally-aligned model was more human-like in its behavior — it tended to succeed in correctly categorizing objects in images for which humans also succeed, and it tended to fail when humans also fail.

Adversarial attacks

The team also found that the neurally-aligned model was more resistant to “adversarial attacks” that developers use to test computer vision and AI systems.  In computer vision, adversarial attacks introduce small distortions into images that are meant to mislead an artificial neural network.

“Say that you have an image that the model identifies as a cat. Because you have the knowledge of the internal workings of the model, you can then design very small changes in the image so that the model suddenly thinks it’s no longer a cat,” DiCarlo explains.

These minor distortions don’t typically fool humans, but computer vision models struggle with these alterations. A person who looks at the subtly distorted cat still reliably and robustly reports that it’s a cat. But standard computer vision models are more likely to mistake the cat for a dog, or even a tree.

“There must be some internal differences in the way our brains process images that lead to our vision being more resistant to those kinds of attacks,” DiCarlo says. And indeed, the team found that when they made their model more neurally-aligned, it became more robust, correctly identifying more images in the face of adversarial attacks.  The model could still be fooled by stronger “attacks,” but so can people, DiCarlo says. His team is now exploring the limits of adversarial robustness in humans.

A few years ago, DiCarlo’s team found they could also improve a model’s resistance to adversarial attacks by designing the first layer of the artificial network to emulate the early visual processing layer in the brain. One key next step is to combine such approaches — making new models that are simultaneously neurally-aligned at multiple visual processing layers.

The new work is further evidence that an exchange of ideas between neuroscience and computer science can drive progress in both fields. “Everybody gets something out of the exciting virtuous cycle between natural/biological intelligence and artificial intelligence,” DiCarlo says. “In this case, computer vision and AI researchers get new ways to achieve robustness and neuroscientists and cognitive scientists get more accurate mechanistic models of human vision.”

This work was supported by the MIT-IBM Watson AI Lab, Semiconductor Research Corporation, DARPA, the Massachusetts Institute of Technology Shoemaker Fellowship, Office of Naval Research, the Simons Foundation, and Canada Research Chair Program.

Eight from MIT elected to American Academy of Arts and Sciences for 2023

Eight MIT faculty members are among more than 250 leaders from academia, the arts, industry, public policy, and research elected to the American Academy of Arts and Sciences, the academy announced April 19.

One of the nation’s most prestigious honorary societies, the academy is also a leading center for independent policy research. Members contribute to academy publications, as well as studies of science and technology policy, energy and global security, social policy and American institutions, the humanities and culture, and education.

Those elected from MIT in 2023 are:

  • Arnaud Costinot, professor of economics;
  • James J. DiCarlo, Peter de Florez Professor of Brain and Cognitive Sciences, director of the MIT Quest for Intelligence, and McGovern Institute Investigator;
  • Piotr Indyk, the Thomas D. and Virginia W. Cabot Professor of Electrical Engineering and Computer Science;
  • Senthil Todadri, professor of physics;
  • Evelyn N. Wang, Ford Professor of Engineering (on leave) and director of the Department of Energy’s Advanced Research Projects Agency-Energy;
  • Boleslaw Wyslouch, professor of physics and director of the Laboratory for Nuclear Science and Bates Research and Engineering Center;
  • Yukiko Yamashita, professor of biology and core member of the Whitehead Institute; and
  • Wei Zhang, professor of mathematics.

“With the election of these members, the academy is honoring excellence, innovation, and leadership and recognizing a broad array of stellar accomplishments. We hope every new member celebrates this achievement and joins our work advancing the common good,” says David W. Oxtoby, president of the academy.

Since its founding in 1780, the academy has elected leading thinkers from each generation, including George Washington and Benjamin Franklin in the 18th century, Maria Mitchell and Daniel Webster in the 19th century, and Toni Morrison and Albert Einstein in the 20th century. The current membership includes more than 250 Nobel and Pulitzer Prize winners.

What powerful new bots like ChatGPT tell us about intelligence and the human brain

This story originally appeared in the Spring 2023 issue of BrainScan.


Artificial intelligence seems to have gotten a lot smarter recently. AI technologies are increasingly integrated into our lives — improving our weather forecasts, finding efficient routes through traffic, personalizing the ads we see and our experiences with social media.

Watercolor image of a robot with a human brain, created using the AI system DALL*E2.

But with the debut of powerful new chatbots like ChatGPT, millions of people have begun interacting with AI tools that seem convincingly human-like. Neuroscientists are taking note — and beginning to dig into what these tools tell us about intelligence and the human brain.

The essence of human intelligence is hard to pin down, let alone engineer. McGovern scientists say there are many kinds of intelligence, and as humans, we call on many different kinds of knowledge and ways of thinking. ChatGPT’s ability to carry on natural conversations with its users has led some to speculate the computer model is sentient, but McGovern neuroscientists insist that the AI technology cannot think for itself.

Still, they say, the field may have reached a turning point.

“I still don’t believe that we can make something that is indistinguishable from a human. I think we’re a long way from that. But for the first time in my life I think there is a small, nonzero chance that it may happen in the next year,” says McGovern founding member Tomaso Poggio, who has studied both human intelligence and machine learning for more than 40 years.

Different sort of intelligence

Developed by the company OpenAI, ChatGPT is an example of a deep neural network, a type of machine learning system that has made its way into virtually every aspect of science and technology. These models learn to perform various tasks by identifying patterns in large datasets. ChatGPT works by scouring texts and detecting and replicating the ways language is used. Drawing on language patterns it finds across the internet, ChatGPT can design you a meal plan, teach you about rocket science, or write a high school-level essay about Mark Twain. With all of the internet as a training tool, models like this have gotten so good at what they do, they can seem all-knowing.

“Engineers have been inventing some of these forms of intelligence since the beginning of the computers. ChatGPT is one. But it is very far from human intelligence.” – Tomaso Poggio

Nonetheless, language models have a restricted skill set. Play with ChatGPT long enough and it will surely give you some wrong information, even if its fluency makes its words deceptively convincing. “These models don’t know about the world, they don’t know about other people’s mental states, they don’t know how things are beyond whatever they can gather from how words go together,” says Postdoctoral Associate Anna Ivanova, who works with McGovern Investigators Evelina Fedorenko and Nancy Kanwisher as well as Jacob Andreas in MIT’s Computer Science and Artificial Intelligence Laboratory.

Such a model, the researchers say, cannot replicate the complex information processing that happens in the human brain. That doesn’t mean language models can’t be intelligent — but theirs is a different sort of intelligence than our own. “I think that there is an infinite number of different forms of intelligence,” says Poggio. “Engineers have been inventing some of these forms of intelligence since the beginning of the computers. ChatGPT is one. But it is very far from human intelligence.”

Under the hood

Just as there are many forms of intelligence, there are also many types of deep learning models — and McGovern researchers are studying the internals of these models to better understand the human brain.

A watercolor painting of a robot generated by DALL*E2.

“These AI models are, in a way, computational hypotheses for what the brain is doing,” Kanwisher says. “Up until a few years ago, we didn’t really have complete computational models of what might be going on in language processing or vision. Once you have a way of generating actual precise models and testing them against real data, you’re kind of off and running in a way that we weren’t ten years ago.”

Artificial neural networks echo the design of the brain in that they are made of densely interconnected networks of simple units that organize themselves — but Poggio says it’s not yet entirely clear how they work.

No one expects that brains and machines will work in exactly the same ways, though some types of deep learning models are more humanlike in their internals than others. For example, a computer vision model developed by McGovern Investigator James DiCarlo responds to images in ways that closely parallel the activity in the visual cortex of animals who are seeing the same thing. DiCarlo’s team can even use their model’s predictions to create an image that will activate specific neurons in an animal’s brain.

“We shouldn’t just automatically assume that if we trained a deep network on a task, that it’s going to look like the brain.” – Ila Fiete

Still, there is reason to be cautious in interpreting what artificial neural networks tell us about biology. “We shouldn’t just automatically assume that if we trained a deep network on a task, that it’s going to look like the brain,” says McGovern Associate Investigator Ila Fiete. Fiete acknowledges that it’s tempting to think of neural networks as models of the brain itself due to their architectural similarities — but she says so far, that idea remains largely untested.

McGovern Institute Associate Investigator Ila Fiete builds theoretical models of the brain. Photo: Caitlin Cunningham

She and her colleagues recently experimented with neural networks that estimate an object’s position in space by integrating information about its changing velocity.

In the brain, specialized neurons known as grid cells carry out this calculation, keeping us aware of where we are as we move through the world. Other researchers had reported that not only can neural networks do this successfully, those that do include components that behave remarkably like grid cells. They had argued that the need to do this kind of path integration must be the reason our brains have grid cells — but Fiete’s team found that artificial networks don’t need to mimic the brain to accomplish this brain-like task. They found that many neural networks can solve the same problem without grid cell-like elements.

One way investigators might generate deep learning models that do work like the brain is to give them a problem that is so complex that there is only one way of solving it, Fiete says.

Language, she acknowledges, might be that complex.

“This is clearly an example of a super-rich task,” she says. “I think on that front, there is a hope that they’re solving such an incredibly difficult task that maybe there is a sense in which they mirror the brain.”

Language parallels

In Fedorenko’s lab, where researchers are focused on identifying and understanding the brain’s language processing circuitry, they have found that some language models do, in fact, mimic certain aspects of human language processing. Many of the most effective models are trained to do a single task: make predictions about word use. That’s what your phone is doing when it suggests words for your text message as you type. Models that are good at this, it turns out, can apply this skill to carrying on conversations, composing essays, and using language in other useful ways. Neuroscientists have found evidence that humans, too, rely on word prediction as a part of language processing.

Fedorenko and her team compared the activity of language models to the brain activity of people as they read or listened to words, sentences, and stories, and found that some models were a better match to human neural responses than others. “The models that do better on this relatively unsophisticated task — just guess what comes next — also do better at capturing human neural responses,” Fedorenko says.

A watercolor painting of a language model, generated by DALL*E2.

It’s a compelling parallel, suggesting computational models and the human brain may have arrived at a similar solution to a problem, even in the face of the biological constraints that have shaped the latter. For Fedorenko and her team, it’s sparked new ideas that they will explore, in part, by modifying existing language models — possibly to more closely mimic the brain.

With so much still unknown about how both human and artificial neural networks learn, Fedorenko says it’s hard to predict what it will take to make language models work and behave more like the human brain. One possibility they are exploring is training a model in a way that more closely mirrors the way children learn language early in life.

Another question, she says, is whether language models might behave more like humans if they had a more limited recall of their own conversations. “All of the state-of-the-art language models keep track of really, really long linguistic contexts. Humans don’t do that,” she says.

Chatbots can retain long strings of dialogue, using those words to tailor their responses as a conversation progresses, she explains. Humans, on the other hand, must cope with a more limited memory. While we can keep track of information as it is conveyed, we only store a string of about eight words as we listen or read. “We get linguistic input, we crunch it up, we extract some kind of meaning representation, presumably in some more abstract format, and then we discard the exact linguistic stream because we don’t need it anymore,” Fedorenko explains.

Language models aren’t able to fill in gaps in conversation with their own knowledge and awareness in the same way a person can, Ivanova adds. “That’s why so far they have to keep track of every single input word,” she says. “If we want a model that models specifically the [human] language network, we don’t need to have this large context window. It would be very cool to train those models on those short windows of context and see if it’s more similar to the language network.”

Multimodal intelligence

Despite these parallels, Fedorenko’s lab has also shown that there are plenty of things language circuits do not do. The brain calls on other circuits to solve math problems, write computer code, and carry out myriad other cognitive processes. Their work makes it clear that in the brain, language and thought are not the same.

That’s borne out by what cognitive neuroscientists like Kanwisher have learned about the functional organization of the human brain, where circuit components are dedicated to surprisingly specific tasks, from language processing to face recognition.

“The upshot of cognitive neuroscience over the last 25 years is that the human brain really has quite a degree of modular organization,” Kanwisher says. “You can look at the brain and say, ‘what does it tell us about the nature of intelligence?’ Well, intelligence is made up of a whole bunch of things.”

In generating this image from the text prompt, “a watercolor painting of a woman looking in a mirror and seeing a robot,” DALL*E2 incorrectly placed the woman (not the robot) in the mirror, highlighting one of the weaknesses of current deep learning models.

In January, Fedorenko, Kanwisher, Ivanova, and colleagues shared an extensive analysis of the capabilities of large language models. After assessing models’ performance on various language-related tasks, they found that despite their mastery of linguistic rules and patterns, such models don’t do a good job using language in real-world situations. From a neuroscience perspective, that kind of functional competence is distinct from formal language competence, calling on not just language-processing circuits but also parts of the brain that store knowledge of the world, reason, and interpret social interactions.

Language is a powerful tool for understanding the world, they say, but it has limits.

“If you train on language prediction alone, you can learn to mimic certain aspects of thinking,” Ivanova says. “But it’s not enough. You need a multimodal system to carry out truly intelligent behavior.”

The team concluded that while AI language models do a very good job using language, they are incomplete models of human thought. For machines to truly think like humans, Ivanova says, they will need a combination of different neural nets all working together, in the same way different networks in the human brain work together to achieve complex cognitive tasks in the real world.

It remains to be seen whether such models would excel in the tech world, but they could prove valuable for revealing insights into human cognition — perhaps in ways that will inform engineers as they strive to build systems that better replicate human intelligence.

James DiCarlo named director of the MIT Quest for Intelligence

James DiCarlo, the Peter de Florez Professor of Neuroscience, has been appointed to the role of director of the MIT Quest for Intelligence. MIT Quest was launched in 2018 to discover the basis of natural intelligence, create new foundations for machine intelligence, and deliver new tools and technologies for humanity.

As director, DiCarlo will forge new collaborations with researchers within MIT and beyond to accelerate progress in understanding intelligence and developing the next generation of intelligence tools.

“We have discovered and developed surprising new connections between natural and artificial intelligence,” says DiCarlo, currently head of the Department of Brain and Cognitive Sciences (BCS). “The scientific understanding of natural intelligence, and advances in building artificial intelligence with positive real-world impact, are interlocked aspects of a unified, collaborative grand challenge, and MIT must continue to lead the way.”

Aude Oliva, senior research scientist at the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the MIT director of the MIT-IBM Watson AI Lab, will lead industry engagements as director of MIT Quest Corporate. Nicholas Roy, professor of aeronautics and astronautics and a member of CSAIL, will lead the development of systems to deliver on the mission as director of MIT Quest Systems Engineering. Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing, will serve as chair of MIT Quest.

“The MIT Quest’s leadership team has positioned this initiative to spearhead our understanding of natural and artificial intelligence, and I am delighted that Jim is taking on this role,” says Huttenlocher, the Henry Ellis Warren (1894) Professor of Electrical Engineering and Computer Science.

DiCarlo will step down from his current role as head of BCS, a position he has held for nearly nine years, and will continue as faculty in BCS and as an investigator in the McGovern Institute for Brain Research.

“Jim has been a highly productive leader for his department, the School of Science, and the Institute at large. I’m excited to see the impact he will make in this new role,” says Nergis Mavalvala, dean of the School of Science and the Curtis and Kathleen Marble Professor of Astrophysics.

As department head, DiCarlo oversaw significant progress in the department’s scientific and educational endeavors. Roughly a quarter of current BCS faculty were hired on his watch, strengthening the department’s foundations in cognitive, systems, and cellular and molecular brain science. In addition, DiCarlo developed a new departmental emphasis in computation, deepening BCS’s ties with the MIT Schwarzman College of Computing and other MIT units such as the Center for Brains, Minds and Machines. He also developed and leads an NIH-funded graduate training program in computationally-enabled integrative neuroscience. As a result, BCS is one of the few departments in the world that is attempting to decipher, in engineering terms, how the human mind emerges from the biological components of the brain.

To prepare students for this future, DiCarlo collaborated with BCS Associate Department Head Michale Fee to design and execute a total overhaul of the Course 9 curriculum. In addition, partnering with the Department of Electrical Engineering and Computer Science, BCS developed a new major, Course 6-9 (Computation and Cognition), to fill the rapidly growing interest in this interdisciplinary topic. In only its second year, Course 6-9 already has more than 100 undergraduate majors.

DiCarlo has also worked tirelessly to build a more open, connected, and supportive culture across the entire BCS community in Building 46. In this work, as in everything, DiCarlo sought to bring people together to address challenges collaboratively. He attributes progress to strong partnerships with Li-Huei Tsai, the Picower Professor of Neuroscience in BCS and director of the Picower Institute for Learning and Memory; Robert Desimone, the Doris and Don Berkey Professor in BCS and director of the McGovern Institute for Brain Research; and to the work of dozens of faculty and staff. For example, in collaboration with associate department head Professor Rebecca Saxe, the department has focused on faculty mentorship of graduate students, and, in collaboration with postdoc officer Professor Mark Bear, the department developed postdoc salary and benefit standards. Both initiatives have become models for the Institute. In recent months, DiCarlo partnered with new associate department head Professor Laura Schulz to constructively focus renewed energy and resources on initiatives to address systemic racism and promote diversity, equity, inclusion, and social justice.

“Looking ahead, I share Jim’s vision for the research and educational programs of the department, and for enhancing its cohesiveness as a community, especially with regard to issues of diversity, equity, inclusion, and justice,” says Mavalvala. “I am deeply committed to supporting his successor in furthering these goals while maintaining the great intellectual strength of BCS.”

In his own research, DiCarlo uses a combination of large-scale neurophysiology, brain imaging, optogenetic methods, and high-throughput computational simulations to understand the neuronal mechanisms and cortical computations that underlie human visual intelligence. Working in animal models, he and his research collaborators have established precise connections between the internal workings of the visual system and the internal workings of particular computer vision systems. And they have demonstrated that these science-to-engineering connections lead to new ways to modulate neurons deep in the brain as well as to improved machine vision systems. His lab’s goals are to help develop more human-like machine vision, new neural prosthetics to restore or augment lost senses, new learning strategies, and an understanding of how visual cognition is impaired in agnosia, autism, and dyslexia.

DiCarlo earned both a PhD in biomedical engineering and an MD from The Johns Hopkins University in 1998, and completed his postdoc training in primate visual neurophysiology at Baylor College of Medicine. He joined the MIT faculty in 2002.

A search committee will convene early this year to recommend candidates for the next department head of BCS. DiCarlo will continue to lead the department until that new head is selected.

Neuroscientists find a way to improve object-recognition models

Computer vision models known as convolutional neural networks can be trained to recognize objects nearly as accurately as humans do. However, these models have one significant flaw: Very small changes to an image, which would be nearly imperceptible to a human viewer, can trick them into making egregious errors such as classifying a cat as a tree.

A team of neuroscientists from MIT, Harvard University, and IBM have developed a way to alleviate this vulnerability, by adding to these models a new layer that is designed to mimic the earliest stage of the brain’s visual processing system. In a new study, they showed that this layer greatly improved the models’ robustness against this type of mistake.

A grid showing the visualization of many common image corruption types. First row, original image, followed by the noise corruptions; second row, blur corruptions; third row, weather corruptions; fourth row, digital corruptions.
Credits: Courtesy of the researchers.

“Just by making the models more similar to the brain’s primary visual cortex, in this single stage of processing, we see quite significant improvements in robustness across many different types of perturbations and corruptions,” says Tiago Marques, an MIT postdoc and one of the lead authors of the study.

Convolutional neural networks are often used in artificial intelligence applications such as self-driving cars, automated assembly lines, and medical diagnostics. Harvard graduate student Joel Dapello, who is also a lead author of the study, adds that “implementing our new approach could potentially make these systems less prone to error and more aligned with human vision.”

“Good scientific hypotheses of how the brain’s visual system works should, by definition, match the brain in both its internal neural patterns and its remarkable robustness. This study shows that achieving those scientific gains directly leads to engineering and application gains,” says James DiCarlo, the head of MIT’s Department of Brain and Cognitive Sciences, an investigator in the Center for Brains, Minds, and Machines and the McGovern Institute for Brain Research, and the senior author of the study.

The study, which is being presented at the NeurIPS conference this month, is also co-authored by MIT graduate student Martin Schrimpf, MIT visiting student Franziska Geiger, and MIT-IBM Watson AI Lab Director David Cox.

Mimicking the brain

Recognizing objects is one of the visual system’s primary functions. In just a small fraction of a second, visual information flows through the ventral visual stream to the brain’s inferior temporal cortex, where neurons contain information needed to classify objects. At each stage in the ventral stream, the brain performs different types of processing. The very first stage in the ventral stream, V1, is one of the most well-characterized parts of the brain and contains neurons that respond to simple visual features such as edges.

“It’s thought that V1 detects local edges or contours of objects, and textures, and does some type of segmentation of the images at a very small scale. Then that information is later used to identify the shape and texture of objects downstream,” Marques says. “The visual system is built in this hierarchical way, where in early stages neurons respond to local features such as small, elongated edges.”

For many years, researchers have been trying to build computer models that can identify objects as well as the human visual system. Today’s leading computer vision systems are already loosely guided by our current knowledge of the brain’s visual processing. However, neuroscientists still don’t know enough about how the entire ventral visual stream is connected to build a model that precisely mimics it, so they borrow techniques from the field of machine learning to train convolutional neural networks on a specific set of tasks. Using this process, a model can learn to identify objects after being trained on millions of images.

Many of these convolutional networks perform very well, but in most cases, researchers don’t know exactly how the network is solving the object-recognition task. In 2013, researchers from DiCarlo’s lab showed that some of these neural networks could not only accurately identify objects, but they could also predict how neurons in the primate brain would respond to the same objects much better than existing alternative models. However, these neural networks are still not able to perfectly predict responses along the ventral visual stream, particularly at the earliest stages of object recognition, such as V1.

These models are also vulnerable to so-called “adversarial attacks.” This means that small changes to an image, such as changing the colors of a few pixels, can lead the model to completely confuse an object for something different — a type of mistake that a human viewer would not make.

A comparison of adversarial images with different perturbation strengths.
Credits: Courtesy of the researchers.

As a first step in their study, the researchers analyzed the performance of 30 of these models and found that models whose internal responses better matched the brain’s V1 responses were also less vulnerable to adversarial attacks. That is, having a more brain-like V1 seemed to make the model more robust. To further test and take advantage of that idea, the researchers decided to create their own model of V1, based on existing neuroscientific models, and place it at the front of convolutional neural networks that had already been developed to perform object recognition.

When the researchers added their V1 layer, which is also implemented as a convolutional neural network, to three of these models, they found that these models became about four times more resistant to making mistakes on images perturbed by adversarial attacks. The models were also less vulnerable to misidentifying objects that were blurred or distorted due to other corruptions.

“Adversarial attacks are a big, open problem for the practical deployment of deep neural networks. The fact that adding neuroscience-inspired elements can improve robustness substantially suggests that there is still a lot that AI can learn from neuroscience, and vice versa,” Cox says.

Better defense

Currently, the best defense against adversarial attacks is a computationally expensive process of training models to recognize the altered images. One advantage of the new V1-based model is that it doesn’t require any additional training. It is also better able to handle a wide range of distortions, beyond adversarial attacks.

The researchers are now trying to identify the key features of their V1 model that allows it to do a better job resisting adversarial attacks, which could help them to make future models even more robust. It could also help them learn more about how the human brain is able to recognize objects.

“One big advantage of the model is that we can map components of the model to particular neuronal populations in the brain,” Dapello says. “We can use this as a tool for novel neuroscientific discoveries, and also continue developing this model to improve its performance under this challenging task.”

The research was funded by the PhRMA Foundation Postdoctoral Fellowship in Informatics, the Semiconductor Research Corporation, DARPA, the MIT Shoemaker Fellowship, the U.S. Office of Naval Research, the Simons Foundation, and the MIT-IBM Watson AI Lab.

Researchers ID crucial brain pathway involved in object recognition

MIT researchers have identified a brain pathway critical in enabling primates to effortlessly identify objects in their field of vision. The findings enrich existing models of the neural circuitry involved in visual perception and help to further unravel the computational code for solving object recognition in the primate brain.

Led by Kohitij Kar, a postdoctoral associate at the McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, the study looked at an area called the ventrolateral prefrontal cortex (vlPFC), which sends feedback signals to the inferior temporal (IT) cortex via a network of neurons. The main goal of this study was to test how the back and forth information processing of this circuitry, that is, this recurrent neural network, is essential to rapid object identification in primates.

The current study, published in Neuron and available today via open access, is a follow-up to prior work published by Kar and James DiCarlo, Peter de Florez Professor of Neuroscience, the head of MIT’s Department of Brain and Cognitive Sciences, and an investigator in the McGovern Institute for Brain Research and the Center for Brains, Minds, and Machines.

Monkey versus machine

In 2019, Kar, DiCarlo, and colleagues identified that primates must use some recurrent circuits during rapid object recognition. Monkey subjects in that study were able to identify objects more accurately than engineered “feedforward” computational models, called deep convolutional neural networks, that lacked recurrent circuitry.

Interestingly, specific images for which models performed poorly compared to monkeys in object identification, also took longer to be solved in the monkeys’ brains — suggesting that the additional time might be due to recurrent processing in the brain. Based on the 2019 study, it remained unclear though exactly which recurrent circuits were responsible for the delayed information boost in the IT cortex. That’s where the current study picks up.

“In this new study, we wanted to find out: Where are these recurrent signals in IT coming from?” Kar said. “Which areas reciprocally connected to IT, are functionally the most critical part of this recurrent circuit?”

To determine this, researchers used a pharmacological agent to temporarily block the activity in parts of the vlPFC in macaques while they engaged in an object discrimination task. During these tasks, monkeys viewed images that contained an object, such as an apple, a car, or a dog; then, researchers used eye tracking to determine if the monkeys could correctly indicate what object they had previously viewed when given two object choices.

“We observed that if you use pharmacological agents to partially inactivate the vlPFC, then both the monkeys’ behavior and IT cortex activity deteriorates but more so for certain specific images. These images were the same ones we identified in the previous study — ones that were poorly solved by ‘feedforward’ models and took longer to be solved in the monkey’s IT cortex,” said Kar.

MIT researchers used an object recognition task (e.g., recognizing that there is a “bird” and not an “elephant” in the shown image) in studying the role of feedback from primate ventrolateral prefrontal cortex (vlPFC) to the inferior temporal (IT) cortex via a network of neurons. In primate brains, temporally blocking the vlPFC (green shaded area) disrupts the recurrent neural network comprising vlPFC and IT inducing specific deficits, implicating its role in rapid object identification. Image: Kohitij Kar, brain image adapted from SciDraw

“These results provide evidence that this recurrently connected network is critical for rapid object recognition, the behavior we’re studying. Now, we have a better understanding of how the full circuit is laid out, and what are the key underlying neural components of this behavior.”

The full study, entitled “Fast recurrent processing via ventrolateral prefrontal cortex is needed by the primate ventral stream for robust core visual object recognition,” will run in print January 6, 2021.

“This study demonstrates the importance of pre-frontal cortical circuits in automatically boosting object recognition performance in a very particular way,” DiCarlo said. “These results were obtained in nonhuman primates and thus are highly likely to also be relevant to human vision.”

The present study makes clear the integral role of the recurrent connections between the vlPFC and the primate ventral visual cortex during rapid object recognition. The results will be helpful to researchers designing future studies that aim to develop accurate models of the brain, and to researchers who seek to develop more human-like artificial intelligence.

Key brain region was “recycled” as humans developed the ability to read

Humans began to develop systems of reading and writing only within the past few thousand years. Our reading abilities set us apart from other animal species, but a few thousand years is much too short a timeframe for our brains to have evolved new areas specifically devoted to reading.

To account for the development of this skill, some scientists have hypothesized that parts of the brain that originally evolved for other purposes have been “recycled” for reading. As one example, they suggest that a part of the visual system that is specialized to perform object recognition has been repurposed for a key component of reading called orthographic processing — the ability to recognize written letters and words.

A new study from MIT neuroscientists offers evidence for this hypothesis. The findings suggest that even in nonhuman primates, who do not know how to read, a part of the brain called the inferotemporal (IT) cortex is capable of performing tasks such as distinguishing words from nonsense words, or picking out specific letters from a word.

“This work has opened up a potential linkage between our rapidly developing understanding of the neural mechanisms of visual processing and an important primate behavior — human reading,” says James DiCarlo, the head of MIT’s Department of Brain and Cognitive Sciences, an investigator in the McGovern Institute for Brain Research and the Center for Brains, Minds, and Machines, and the senior author of the study.

Rishi Rajalingham, an MIT postdoc, is the lead author of the study, which appears in Nature Communications. Other MIT authors are postdoc Kohitij Kar and technical associate Sachi Sanghavi. The research team also includes Stanislas Dehaene, a professor of experimental cognitive psychology at the Collège de France.

Word recognition

Reading is a complex process that requires recognizing words, assigning meaning to those words, and associating words with their corresponding sound. These functions are believed to be spread out over different parts of the human brain.

Functional magnetic resonance imaging (fMRI) studies have identified a region called the visual word form area (VWFA) that lights up when the brain processes a written word. This region is involved in the orthographic stage: It discriminates words from jumbled strings of letters or words from unknown alphabets. The VWFA is located in the IT cortex, a part of the visual cortex that is also responsible for identifying objects.

DiCarlo and Dehaene became interested in studying the neural mechanisms behind word recognition after cognitive psychologists in France reported that baboons could learn to discriminate words from nonwords, in a study that appeared in Science in 2012.

Using fMRI, Dehaene’s lab has previously found that parts of the IT cortex that respond to objects and faces become highly specialized for recognizing written words once people learn to read.

“However, given the limitations of human imaging methods, it has been challenging to characterize these representations at the resolution of individual neurons, and to quantitatively test if and how these representations might be reused to support orthographic processing,” Dehaene says. “These findings inspired us to ask if nonhuman primates could provide a unique opportunity to investigate the neuronal mechanisms underlying orthographic processing.”

The researchers hypothesized that if parts of the primate brain are predisposed to process text, they might be able to find patterns reflecting that in the neural activity of nonhuman primates as they simply look at words.

To test that idea, the researchers recorded neural activity from about 500 neural sites across the IT cortex of macaques as they looked at about 2,000 strings of letters, some of which were English words and some of which were nonsensical strings of letters.

“The efficiency of this methodology is that you don’t need to train animals to do anything,” Rajalingham says. “What you do is just record these patterns of neural activity as you flash an image in front of the animal.”

The researchers then fed that neural data into a simple computer model called a linear classifier. This model learns to combine the inputs from each of the 500 neural sites to predict whether the string of letters that provoked that activity pattern was a word or not. While the animal itself is not performing this task, the model acts as a “stand-in” that uses the neural data to generate a behavior, Rajalingham says.

Using that neural data, the model was able to generate accurate predictions for many orthographic tasks, including distinguishing words from nonwords and determining if a particular letter is present in a string of words. The model was about 70 percent accurate at distinguishing words from nonwords, which is very similar to the rate reported in the 2012 Science study with baboons. Furthermore, the patterns of errors made by model were similar to those made by the animals.

Neuronal recycling

The researchers also recorded neural activity from a different brain area that also feeds into IT cortex: V4, which is part of the visual cortex. When they fed V4 activity patterns into the linear classifier model, the model poorly predicted (compared to IT) the human or baboon performance on the orthographic processing tasks.

The findings suggest that the IT cortex is particularly well-suited to be repurposed for skills that are needed for reading, and they support the hypothesis that some of the mechanisms of reading are built upon highly evolved mechanisms for object recognition, the researchers say.

The researchers now plan to train animals to perform orthographic tasks and measure how their neural activity changes as they learn the tasks.

The research was funded by the Simons Foundation and the U.S. Office of Naval Research.

Full Paper at Nature Communications

Tidying up deep neural networks

Visual art has found many ways of representing objects, from the ornate Baroque period to modernist simplicity. Artificial visual systems are somewhat analogous: from relatively simple beginnings inspired by key regions in the visual cortex, recent advances in performance have seen increasing complexity.

“Our overall goal has been to build an accurate, engineering-level model of the visual system, to ‘reverse engineer’ visual intelligence,” explains James DiCarlo, the head of MIT’s Department of Brain and Cognitive Sciences, an investigator in the McGovern Institute for Brain Research and the Center for Brains, Minds, and Machines (CBMM). “But very high-performing ANNs have started to drift away from brain architecture, with complex branching architectures that have no clear parallel in the brain.”

A new model from the DiCarlo lab has re-imposed a brain-like architecture on an object recognition network. The result is a shallow-network architecture with surprisingly high performance, indicating that we can simplify deeper– and more baroque– networks yet retain high performance in artificial learning systems.

“We’ve made two major advances,” explains graduate student Martin Schrimpf, who led the work with Jonas Kubilius at CBMM. “We’ve found a way of checking how well models match the brain, called Brain-Score, and developed a model, CORnet, that moves artificial object recognition, as well as machine learning architectures, forward.”

DiCarlo lab graduate student Martin Schrimpf in the lab. Photo: Kris Brewer

Back to the brain

Deep convolutional artificial neural networks were initially inspired by brain anatomy, and are the leading models in artificial object recognition. Training these feedforward systems on recognizing objects in ImageNet, a large database of images, has allowed performance of ANNs to vastly improve, but at the same time networks have literally branched out, become increasingly complex with hundreds of layers. In contrast, the visual ventral stream, a series of cortical brain regions that unpack object identity, contains a relatively minuscule four key regions. In addition, ANNs are entirely feedforward, while the primate cortical visual system has densely interconnected wiring, in other words, recurrent connectivity. While primate-like object recognition capabilities can be captured through feedforward-only networks, recurrent wiring in the brain has long been suspected, and recently shown in two DiCarlo lab papers led by Kar and Tang respectively, to be important.

DiCarlo and colleagues have now developed CORnet-S, inspired by very complex, state-of-the-art neural networks. CORnet-S has four computational areas, analogous to cortical visual areas (V1, V2, V4, and IT). In addition, CORnet-S contains repeated, or recurrent, connections.

“We really pre-defined layers in the ANN, defining V1, V2, and so on, and introduced feedback and repeated connections” explains Schrimpf. “As a result, we ended up with fewer layers, and less ‘dead space’ that cannot be mapped to the brain. In short, a simpler network.”

Keeping score

To optimize the system, the researchers incorporated quantitative assessment through a new system, Brain-Score.

“Until now, we’ve needed to qualitatively eyeball model performance relative to the brain,” says Schrimpf. “Brain-Score allows us to actually quantitatively evaluate and benchmark models.”

They found that CORnet-S ranks highly on Brain-Score, and is the best performer of all shallow ANNs. Indeed, the system, shallow as it is, rivals the complex, ultra-deep ANNs that currently perform at the highest level.

CORnet was also benchmarked against human performance. To test, for example, whether the system can predict human behavior, 1,472 humans were shown images for 100ms and then asked to identify objects in them. CORnet-S was able to predict the general accuracy of humans to make calls about what they had briefly glimpsed (bear vs. dog etc.). Indeed, CORnet-S is able to predict the behavior, as well as the neural dynamics, of the visual ventral stream, indicating that it is modeling primate-like behavior.

“We thought we’d lose performance by going to a wide, shallow network, but with recurrence, we hardly lost any,” says Schrimpf, “the message for machine learning more broadly, is you can get away without really deep networks.”

Such models of brain processing have benefits for both neuroscience and artificial systems, helping us to understand the elements of image processing by the brain. Neuroscience in turn informs us that features such as recurrence, can be used to improve performance in shallow networks, an important message for artificial intelligence systems more broadly.

“There are clear advantages to the high performing, complex deep networks,” explains DiCarlo, “but it’s possible to rein the network in, using the elegance of the primate brain as a model, and we think this will ultimately lead to other kinds of advantages.”

Putting vision models to the test

MIT neuroscientists have performed the most rigorous testing yet of computational models that mimic the brain’s visual cortex.

Using their current best model of the brain’s visual neural network, the researchers designed a new way to precisely control individual neurons and populations of neurons in the middle of that network. In an animal study, the team then showed that the information gained from the computational model enabled them to create images that strongly activated specific brain neurons of their choosing.

The findings suggest that the current versions of these models are similar enough to the brain that they could be used to control brain states in animals. The study also helps to establish the usefulness of these vision models, which have generated vigorous debate over whether they accurately mimic how the visual cortex works, says James DiCarlo, the head of MIT’s Department of Brain and Cognitive Sciences, an investigator in the McGovern Institute for Brain Research and the Center for Brains, Minds, and Machines, and the senior author of the study.

“People have questioned whether these models provide understanding of the visual system,” he says. “Rather than debate that in an academic sense, we showed that these models are already powerful enough to enable an important new application. Whether you understand how the model works or not, it’s already useful in that sense.”

MIT postdocs Pouya Bashivan and Kohitij Kar are the lead authors of the paper, which appears in the May 2 online edition of Science.

Neural control

Over the past several years, DiCarlo and others have developed models of the visual system based on artificial neural networks. Each network starts out with an arbitrary architecture consisting of model neurons, or nodes, that can be connected to each other with different strengths, also called weights.

The researchers then train the models on a library of more than 1 million images. As the researchers show the model each image, along with a label for the most prominent object in the image, such as an airplane or a chair, the model learns to recognize objects by changing the strengths of its connections.

It’s difficult to determine exactly how the model achieves this kind of recognition, but DiCarlo and his colleagues have previously shown that the “neurons” within these models produce activity patterns very similar to those seen in the animal visual cortex in response to the same images.

In the new study, the researchers wanted to test whether their models could perform some tasks that previously have not been demonstrated. In particular, they wanted to see if the models could be used to control neural activity in the visual cortex of animals.

“So far, what has been done with these models is predicting what the neural responses would be to other stimuli that they have not seen before,” Bashivan says. “The main difference here is that we are going one step further and using the models to drive the neurons into desired states.”

To achieve this, the researchers first created a one-to-one map of neurons in the brain’s visual area V4 to nodes in the computational model. They did this by showing images to animals and to the models, and comparing their responses to the same images. There are millions of neurons in area V4, but for this study, the researchers created maps for subpopulations of five to 40 neurons at a time.

“Once each neuron has an assignment, the model allows you to make predictions about that neuron,” DiCarlo says.

The researchers then set out to see if they could use those predictions to control the activity of individual neurons in the visual cortex. The first type of control, which they called “stretching,” involves showing an image that will drive the activity of a specific neuron far beyond the activity usually elicited by “natural” images similar to those used to train the neural networks.

The researchers found that when they showed animals these “synthetic” images, which are created by the models and do not resemble natural objects, the target neurons did respond as expected. On average, the neurons showed about 40 percent more activity in response to these images than when they were shown natural images like those used to train the model. This kind of control has never been reported before.

“That they succeeded in doing this is really amazing. It’s as if, for that neuron at least, its ideal image suddenly leaped into focus. The neuron was suddenly presented with the stimulus it had always been searching for,” says Aaron Batista, an associate professor of bioengineering at the University of Pittsburgh, who was not involved in the study. “This is a remarkable idea, and to pull it off is quite a feat. It is perhaps the strongest validation so far of the use of artificial neural networks to understand real neural networks.”

In a similar set of experiments, the researchers attempted to generate images that would drive one neuron maximally while also keeping the activity in nearby neurons very low, a more difficult task. For most of the neurons they tested, the researchers were able to enhance the activity of the target neuron with little increase in the surrounding neurons.

“A common trend in neuroscience is that experimental data collection and computational modeling are executed somewhat independently, resulting in very little model validation, and thus no measurable progress. Our efforts bring back to life this ‘closed loop’ approach, engaging model predictions and neural measurements that are critical to the success of building and testing models that will most resemble the brain,” Kar says.

Measuring accuracy

The researchers also showed that they could use the model to predict how neurons of area V4 would respond to synthetic images. Most previous tests of these models have used the same type of naturalistic images that were used to train the model. The MIT team found that the models were about 54 percent accurate at predicting how the brain would respond to the synthetic images, compared to nearly 90 percent accuracy when the natural images are used.

“In a sense, we’re quantifying how accurate these models are at making predictions outside the domain where they were trained,” Bashivan says. “Ideally the model should be able to predict accurately no matter what the input is.”

The researchers now hope to improve the models’ accuracy by allowing them to incorporate the new information they learn from seeing the synthetic images, which was not done in this study.

This kind of control could be useful for neuroscientists who want to study how different neurons interact with each other, and how they might be connected, the researchers say. Farther in the future, this approach could potentially be useful for treating mood disorders such as depression. The researchers are now working on extending their model to the inferotemporal cortex, which feeds into the amygdala, which is involved in processing emotions.

“If we had a good model of the neurons that are engaged in experiencing emotions or causing various kinds of disorders, then we could use that model to drive the neurons in a way that would help to ameliorate those disorders,” Bashivan says.

The research was funded by the Intelligence Advanced Research Projects Agency, the MIT-IBM Watson AI Lab, the National Eye Institute, and the Office of Naval Research.

Algorithms of intelligence

The following post is adapted from a story featured in a recent Brain Scan newsletter.

Machine vision systems are more and more common in everyday life, from social media to self-driving cars, but training artificial neural networks to “see” the world as we do—distinguishing cyclists from signposts—remains challenging. Will artificial neural networks ever decode the world as exquisitely as humans? Can we refine these models and influence perception in a person’s brain just by activating individual, selected neurons? The DiCarlo lab, including CBMM postdocs Kohitij Kar and Pouya Bashivan, are finding that we are surprisingly close to answering “yes” to such questions, all in the context of accelerated insights into artificial intelligence at the McGovern Institute for Brain Research, CBMM, and the Quest for Intelligence at MIT.

Precision Modeling

Beyond light hitting the retina, the recognition process that unfolds in the visual cortex is key to truly “seeing” the surrounding world. Information is decoded through the ventral visual stream, cortical brain regions that progressively build a more accurate, fine-grained, and accessible representation of the objects around us. Artificial neural networks have been modeled on these elegant cortical systems, and the most successful models, deep convolutional neural networks (DCNNs), can now decode objects at levels comparable to the primate brain. However, even leading DCNNs have problems with certain challenging images, presumably due to shadows, clutter, and other visual noise. While there’s no simple feature that unites all challenging images, the quest is on to tackle such images to attain precise recognition at a level commensurate with human object recognition.

“One next step is to couple this new precision tool with our emerging understanding of how neural patterns underlie object perception. This might allow us to create arrangements of pixels that look nothing like, for example, a cat, but that can fool the brain into thinking it’s seeing a cat.”- James DiCarlo

In a recent push, Kar and DiCarlo demonstrated that adding feedback connections, currently missing in most DCNNs, allows the system to better recognize objects in challenging situations, even those where a human can’t articulate why recognition is an issue for feedforward DCNNs. They also found that this recurrent circuit seems critical to primate success rates in performing this task. This is incredibly important for systems like self-driving cars, where the stakes for artificial visual systems are high, and faithful recognition is a must.

Now you see it

As artificial object recognition systems have become more precise in predicting neural activity, the DiCarlo lab wondered what such precision might allow: could they use their system to not only predict, but to control specific neuronal activity?

To demonstrate the power of their models, Bashivan, Kar, and colleagues zeroed in on targeted neurons in the brain. In a paper published in Science, they used an artificial neural network to generate a random-looking group of pixels that, when shown to an animal, activated the team’s target, a target they called “one hot neuron.” In other words, they showed the brain a synthetic pattern, and the pixels in the pattern precisely activated targeted neurons while other neurons remained relatively silent.

These findings show how the knowledge in today’s artificial neural network models might one day be used to noninvasively influence brain states with neural resolution. Such precise systems would be useful as we look to the future, toward visual prosthetics for the blind. Such a precise model of the ventral visual stream would have been incon-ceivable not so long ago, and all eyes are on where McGovern researchers will take these technologies in the coming years.