James DiCarlo named director of the MIT Quest for Intelligence

James DiCarlo, the Peter de Florez Professor of Neuroscience, has been appointed to the role of director of the MIT Quest for Intelligence. MIT Quest was launched in 2018 to discover the basis of natural intelligence, create new foundations for machine intelligence, and deliver new tools and technologies for humanity.

As director, DiCarlo will forge new collaborations with researchers within MIT and beyond to accelerate progress in understanding intelligence and developing the next generation of intelligence tools.

“We have discovered and developed surprising new connections between natural and artificial intelligence,” says DiCarlo, currently head of the Department of Brain and Cognitive Sciences (BCS). “The scientific understanding of natural intelligence, and advances in building artificial intelligence with positive real-world impact, are interlocked aspects of a unified, collaborative grand challenge, and MIT must continue to lead the way.”

Aude Oliva, senior research scientist at the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the MIT director of the MIT-IBM Watson AI Lab, will lead industry engagements as director of MIT Quest Corporate. Nicholas Roy, professor of aeronautics and astronautics and a member of CSAIL, will lead the development of systems to deliver on the mission as director of MIT Quest Systems Engineering. Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing, will serve as chair of MIT Quest.

“The MIT Quest’s leadership team has positioned this initiative to spearhead our understanding of natural and artificial intelligence, and I am delighted that Jim is taking on this role,” says Huttenlocher, the Henry Ellis Warren (1894) Professor of Electrical Engineering and Computer Science.

DiCarlo will step down from his current role as head of BCS, a position he has held for nearly nine years, and will continue as faculty in BCS and as an investigator in the McGovern Institute for Brain Research.

“Jim has been a highly productive leader for his department, the School of Science, and the Institute at large. I’m excited to see the impact he will make in this new role,” says Nergis Mavalvala, dean of the School of Science and the Curtis and Kathleen Marble Professor of Astrophysics.

As department head, DiCarlo oversaw significant progress in the department’s scientific and educational endeavors. Roughly a quarter of current BCS faculty were hired on his watch, strengthening the department’s foundations in cognitive, systems, and cellular and molecular brain science. In addition, DiCarlo developed a new departmental emphasis in computation, deepening BCS’s ties with the MIT Schwarzman College of Computing and other MIT units such as the Center for Brains, Minds and Machines. He also developed and leads an NIH-funded graduate training program in computationally-enabled integrative neuroscience. As a result, BCS is one of the few departments in the world that is attempting to decipher, in engineering terms, how the human mind emerges from the biological components of the brain.

To prepare students for this future, DiCarlo collaborated with BCS Associate Department Head Michale Fee to design and execute a total overhaul of the Course 9 curriculum. In addition, partnering with the Department of Electrical Engineering and Computer Science, BCS developed a new major, Course 6-9 (Computation and Cognition), to fill the rapidly growing interest in this interdisciplinary topic. In only its second year, Course 6-9 already has more than 100 undergraduate majors.

DiCarlo has also worked tirelessly to build a more open, connected, and supportive culture across the entire BCS community in Building 46. In this work, as in everything, DiCarlo sought to bring people together to address challenges collaboratively. He attributes progress to strong partnerships with Li-Huei Tsai, the Picower Professor of Neuroscience in BCS and director of the Picower Institute for Learning and Memory; Robert Desimone, the Doris and Don Berkey Professor in BCS and director of the McGovern Institute for Brain Research; and to the work of dozens of faculty and staff. For example, in collaboration with associate department head Professor Rebecca Saxe, the department has focused on faculty mentorship of graduate students, and, in collaboration with postdoc officer Professor Mark Bear, the department developed postdoc salary and benefit standards. Both initiatives have become models for the Institute. In recent months, DiCarlo partnered with new associate department head Professor Laura Schulz to constructively focus renewed energy and resources on initiatives to address systemic racism and promote diversity, equity, inclusion, and social justice.

“Looking ahead, I share Jim’s vision for the research and educational programs of the department, and for enhancing its cohesiveness as a community, especially with regard to issues of diversity, equity, inclusion, and justice,” says Mavalvala. “I am deeply committed to supporting his successor in furthering these goals while maintaining the great intellectual strength of BCS.”

In his own research, DiCarlo uses a combination of large-scale neurophysiology, brain imaging, optogenetic methods, and high-throughput computational simulations to understand the neuronal mechanisms and cortical computations that underlie human visual intelligence. Working in animal models, he and his research collaborators have established precise connections between the internal workings of the visual system and the internal workings of particular computer vision systems. And they have demonstrated that these science-to-engineering connections lead to new ways to modulate neurons deep in the brain as well as to improved machine vision systems. His lab’s goals are to help develop more human-like machine vision, new neural prosthetics to restore or augment lost senses, new learning strategies, and an understanding of how visual cognition is impaired in agnosia, autism, and dyslexia.

DiCarlo earned both a PhD in biomedical engineering and an MD from The Johns Hopkins University in 1998, and completed his postdoc training in primate visual neurophysiology at Baylor College of Medicine. He joined the MIT faculty in 2002.

A search committee will convene early this year to recommend candidates for the next department head of BCS. DiCarlo will continue to lead the department until that new head is selected.

Neuroscientists find a way to improve object-recognition models

Computer vision models known as convolutional neural networks can be trained to recognize objects nearly as accurately as humans do. However, these models have one significant flaw: Very small changes to an image, which would be nearly imperceptible to a human viewer, can trick them into making egregious errors such as classifying a cat as a tree.

A team of neuroscientists from MIT, Harvard University, and IBM have developed a way to alleviate this vulnerability, by adding to these models a new layer that is designed to mimic the earliest stage of the brain’s visual processing system. In a new study, they showed that this layer greatly improved the models’ robustness against this type of mistake.

A grid showing the visualization of many common image corruption types. First row, original image, followed by the noise corruptions; second row, blur corruptions; third row, weather corruptions; fourth row, digital corruptions.
Credits: Courtesy of the researchers.

“Just by making the models more similar to the brain’s primary visual cortex, in this single stage of processing, we see quite significant improvements in robustness across many different types of perturbations and corruptions,” says Tiago Marques, an MIT postdoc and one of the lead authors of the study.

Convolutional neural networks are often used in artificial intelligence applications such as self-driving cars, automated assembly lines, and medical diagnostics. Harvard graduate student Joel Dapello, who is also a lead author of the study, adds that “implementing our new approach could potentially make these systems less prone to error and more aligned with human vision.”

“Good scientific hypotheses of how the brain’s visual system works should, by definition, match the brain in both its internal neural patterns and its remarkable robustness. This study shows that achieving those scientific gains directly leads to engineering and application gains,” says James DiCarlo, the head of MIT’s Department of Brain and Cognitive Sciences, an investigator in the Center for Brains, Minds, and Machines and the McGovern Institute for Brain Research, and the senior author of the study.

The study, which is being presented at the NeurIPS conference this month, is also co-authored by MIT graduate student Martin Schrimpf, MIT visiting student Franziska Geiger, and MIT-IBM Watson AI Lab Director David Cox.

Mimicking the brain

Recognizing objects is one of the visual system’s primary functions. In just a small fraction of a second, visual information flows through the ventral visual stream to the brain’s inferior temporal cortex, where neurons contain information needed to classify objects. At each stage in the ventral stream, the brain performs different types of processing. The very first stage in the ventral stream, V1, is one of the most well-characterized parts of the brain and contains neurons that respond to simple visual features such as edges.

“It’s thought that V1 detects local edges or contours of objects, and textures, and does some type of segmentation of the images at a very small scale. Then that information is later used to identify the shape and texture of objects downstream,” Marques says. “The visual system is built in this hierarchical way, where in early stages neurons respond to local features such as small, elongated edges.”

For many years, researchers have been trying to build computer models that can identify objects as well as the human visual system. Today’s leading computer vision systems are already loosely guided by our current knowledge of the brain’s visual processing. However, neuroscientists still don’t know enough about how the entire ventral visual stream is connected to build a model that precisely mimics it, so they borrow techniques from the field of machine learning to train convolutional neural networks on a specific set of tasks. Using this process, a model can learn to identify objects after being trained on millions of images.

Many of these convolutional networks perform very well, but in most cases, researchers don’t know exactly how the network is solving the object-recognition task. In 2013, researchers from DiCarlo’s lab showed that some of these neural networks could not only accurately identify objects, but they could also predict how neurons in the primate brain would respond to the same objects much better than existing alternative models. However, these neural networks are still not able to perfectly predict responses along the ventral visual stream, particularly at the earliest stages of object recognition, such as V1.

These models are also vulnerable to so-called “adversarial attacks.” This means that small changes to an image, such as changing the colors of a few pixels, can lead the model to completely confuse an object for something different — a type of mistake that a human viewer would not make.

A comparison of adversarial images with different perturbation strengths.
Credits: Courtesy of the researchers.

As a first step in their study, the researchers analyzed the performance of 30 of these models and found that models whose internal responses better matched the brain’s V1 responses were also less vulnerable to adversarial attacks. That is, having a more brain-like V1 seemed to make the model more robust. To further test and take advantage of that idea, the researchers decided to create their own model of V1, based on existing neuroscientific models, and place it at the front of convolutional neural networks that had already been developed to perform object recognition.

When the researchers added their V1 layer, which is also implemented as a convolutional neural network, to three of these models, they found that these models became about four times more resistant to making mistakes on images perturbed by adversarial attacks. The models were also less vulnerable to misidentifying objects that were blurred or distorted due to other corruptions.

“Adversarial attacks are a big, open problem for the practical deployment of deep neural networks. The fact that adding neuroscience-inspired elements can improve robustness substantially suggests that there is still a lot that AI can learn from neuroscience, and vice versa,” Cox says.

Better defense

Currently, the best defense against adversarial attacks is a computationally expensive process of training models to recognize the altered images. One advantage of the new V1-based model is that it doesn’t require any additional training. It is also better able to handle a wide range of distortions, beyond adversarial attacks.

The researchers are now trying to identify the key features of their V1 model that allows it to do a better job resisting adversarial attacks, which could help them to make future models even more robust. It could also help them learn more about how the human brain is able to recognize objects.

“One big advantage of the model is that we can map components of the model to particular neuronal populations in the brain,” Dapello says. “We can use this as a tool for novel neuroscientific discoveries, and also continue developing this model to improve its performance under this challenging task.”

The research was funded by the PhRMA Foundation Postdoctoral Fellowship in Informatics, the Semiconductor Research Corporation, DARPA, the MIT Shoemaker Fellowship, the U.S. Office of Naval Research, the Simons Foundation, and the MIT-IBM Watson AI Lab.

Researchers ID crucial brain pathway involved in object recognition

MIT researchers have identified a brain pathway critical in enabling primates to effortlessly identify objects in their field of vision. The findings enrich existing models of the neural circuitry involved in visual perception and help to further unravel the computational code for solving object recognition in the primate brain.

Led by Kohitij Kar, a postdoctoral associate at the McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, the study looked at an area called the ventrolateral prefrontal cortex (vlPFC), which sends feedback signals to the inferior temporal (IT) cortex via a network of neurons. The main goal of this study was to test how the back and forth information processing of this circuitry, that is, this recurrent neural network, is essential to rapid object identification in primates.

The current study, published in Neuron and available today via open access, is a follow-up to prior work published by Kar and James DiCarlo, Peter de Florez Professor of Neuroscience, the head of MIT’s Department of Brain and Cognitive Sciences, and an investigator in the McGovern Institute for Brain Research and the Center for Brains, Minds, and Machines.

Monkey versus machine

In 2019, Kar, DiCarlo, and colleagues identified that primates must use some recurrent circuits during rapid object recognition. Monkey subjects in that study were able to identify objects more accurately than engineered “feedforward” computational models, called deep convolutional neural networks, that lacked recurrent circuitry.

Interestingly, specific images for which models performed poorly compared to monkeys in object identification, also took longer to be solved in the monkeys’ brains — suggesting that the additional time might be due to recurrent processing in the brain. Based on the 2019 study, it remained unclear though exactly which recurrent circuits were responsible for the delayed information boost in the IT cortex. That’s where the current study picks up.

“In this new study, we wanted to find out: Where are these recurrent signals in IT coming from?” Kar said. “Which areas reciprocally connected to IT, are functionally the most critical part of this recurrent circuit?”

To determine this, researchers used a pharmacological agent to temporarily block the activity in parts of the vlPFC in macaques while they engaged in an object discrimination task. During these tasks, monkeys viewed images that contained an object, such as an apple, a car, or a dog; then, researchers used eye tracking to determine if the monkeys could correctly indicate what object they had previously viewed when given two object choices.

“We observed that if you use pharmacological agents to partially inactivate the vlPFC, then both the monkeys’ behavior and IT cortex activity deteriorates but more so for certain specific images. These images were the same ones we identified in the previous study — ones that were poorly solved by ‘feedforward’ models and took longer to be solved in the monkey’s IT cortex,” said Kar.

MIT researchers used an object recognition task (e.g., recognizing that there is a “bird” and not an “elephant” in the shown image) in studying the role of feedback from primate ventrolateral prefrontal cortex (vlPFC) to the inferior temporal (IT) cortex via a network of neurons. In primate brains, temporally blocking the vlPFC (green shaded area) disrupts the recurrent neural network comprising vlPFC and IT inducing specific deficits, implicating its role in rapid object identification. Image: Kohitij Kar, brain image adapted from SciDraw

“These results provide evidence that this recurrently connected network is critical for rapid object recognition, the behavior we’re studying. Now, we have a better understanding of how the full circuit is laid out, and what are the key underlying neural components of this behavior.”

The full study, entitled “Fast recurrent processing via ventrolateral prefrontal cortex is needed by the primate ventral stream for robust core visual object recognition,” will run in print January 6, 2021.

“This study demonstrates the importance of pre-frontal cortical circuits in automatically boosting object recognition performance in a very particular way,” DiCarlo said. “These results were obtained in nonhuman primates and thus are highly likely to also be relevant to human vision.”

The present study makes clear the integral role of the recurrent connections between the vlPFC and the primate ventral visual cortex during rapid object recognition. The results will be helpful to researchers designing future studies that aim to develop accurate models of the brain, and to researchers who seek to develop more human-like artificial intelligence.

Key brain region was “recycled” as humans developed the ability to read

Humans began to develop systems of reading and writing only within the past few thousand years. Our reading abilities set us apart from other animal species, but a few thousand years is much too short a timeframe for our brains to have evolved new areas specifically devoted to reading.

To account for the development of this skill, some scientists have hypothesized that parts of the brain that originally evolved for other purposes have been “recycled” for reading. As one example, they suggest that a part of the visual system that is specialized to perform object recognition has been repurposed for a key component of reading called orthographic processing — the ability to recognize written letters and words.

A new study from MIT neuroscientists offers evidence for this hypothesis. The findings suggest that even in nonhuman primates, who do not know how to read, a part of the brain called the inferotemporal (IT) cortex is capable of performing tasks such as distinguishing words from nonsense words, or picking out specific letters from a word.

“This work has opened up a potential linkage between our rapidly developing understanding of the neural mechanisms of visual processing and an important primate behavior — human reading,” says James DiCarlo, the head of MIT’s Department of Brain and Cognitive Sciences, an investigator in the McGovern Institute for Brain Research and the Center for Brains, Minds, and Machines, and the senior author of the study.

Rishi Rajalingham, an MIT postdoc, is the lead author of the study, which appears in Nature Communications. Other MIT authors are postdoc Kohitij Kar and technical associate Sachi Sanghavi. The research team also includes Stanislas Dehaene, a professor of experimental cognitive psychology at the Collège de France.

Word recognition

Reading is a complex process that requires recognizing words, assigning meaning to those words, and associating words with their corresponding sound. These functions are believed to be spread out over different parts of the human brain.

Functional magnetic resonance imaging (fMRI) studies have identified a region called the visual word form area (VWFA) that lights up when the brain processes a written word. This region is involved in the orthographic stage: It discriminates words from jumbled strings of letters or words from unknown alphabets. The VWFA is located in the IT cortex, a part of the visual cortex that is also responsible for identifying objects.

DiCarlo and Dehaene became interested in studying the neural mechanisms behind word recognition after cognitive psychologists in France reported that baboons could learn to discriminate words from nonwords, in a study that appeared in Science in 2012.

Using fMRI, Dehaene’s lab has previously found that parts of the IT cortex that respond to objects and faces become highly specialized for recognizing written words once people learn to read.

“However, given the limitations of human imaging methods, it has been challenging to characterize these representations at the resolution of individual neurons, and to quantitatively test if and how these representations might be reused to support orthographic processing,” Dehaene says. “These findings inspired us to ask if nonhuman primates could provide a unique opportunity to investigate the neuronal mechanisms underlying orthographic processing.”

The researchers hypothesized that if parts of the primate brain are predisposed to process text, they might be able to find patterns reflecting that in the neural activity of nonhuman primates as they simply look at words.

To test that idea, the researchers recorded neural activity from about 500 neural sites across the IT cortex of macaques as they looked at about 2,000 strings of letters, some of which were English words and some of which were nonsensical strings of letters.

“The efficiency of this methodology is that you don’t need to train animals to do anything,” Rajalingham says. “What you do is just record these patterns of neural activity as you flash an image in front of the animal.”

The researchers then fed that neural data into a simple computer model called a linear classifier. This model learns to combine the inputs from each of the 500 neural sites to predict whether the string of letters that provoked that activity pattern was a word or not. While the animal itself is not performing this task, the model acts as a “stand-in” that uses the neural data to generate a behavior, Rajalingham says.

Using that neural data, the model was able to generate accurate predictions for many orthographic tasks, including distinguishing words from nonwords and determining if a particular letter is present in a string of words. The model was about 70 percent accurate at distinguishing words from nonwords, which is very similar to the rate reported in the 2012 Science study with baboons. Furthermore, the patterns of errors made by model were similar to those made by the animals.

Neuronal recycling

The researchers also recorded neural activity from a different brain area that also feeds into IT cortex: V4, which is part of the visual cortex. When they fed V4 activity patterns into the linear classifier model, the model poorly predicted (compared to IT) the human or baboon performance on the orthographic processing tasks.

The findings suggest that the IT cortex is particularly well-suited to be repurposed for skills that are needed for reading, and they support the hypothesis that some of the mechanisms of reading are built upon highly evolved mechanisms for object recognition, the researchers say.

The researchers now plan to train animals to perform orthographic tasks and measure how their neural activity changes as they learn the tasks.

The research was funded by the Simons Foundation and the U.S. Office of Naval Research.

Full Paper at Nature Communications

Tidying up deep neural networks

Visual art has found many ways of representing objects, from the ornate Baroque period to modernist simplicity. Artificial visual systems are somewhat analogous: from relatively simple beginnings inspired by key regions in the visual cortex, recent advances in performance have seen increasing complexity.

“Our overall goal has been to build an accurate, engineering-level model of the visual system, to ‘reverse engineer’ visual intelligence,” explains James DiCarlo, the head of MIT’s Department of Brain and Cognitive Sciences, an investigator in the McGovern Institute for Brain Research and the Center for Brains, Minds, and Machines (CBMM). “But very high-performing ANNs have started to drift away from brain architecture, with complex branching architectures that have no clear parallel in the brain.”

A new model from the DiCarlo lab has re-imposed a brain-like architecture on an object recognition network. The result is a shallow-network architecture with surprisingly high performance, indicating that we can simplify deeper– and more baroque– networks yet retain high performance in artificial learning systems.

“We’ve made two major advances,” explains graduate student Martin Schrimpf, who led the work with Jonas Kubilius at CBMM. “We’ve found a way of checking how well models match the brain, called Brain-Score, and developed a model, CORnet, that moves artificial object recognition, as well as machine learning architectures, forward.”

DiCarlo lab graduate student Martin Schrimpf in the lab. Photo: Kris Brewer

Back to the brain

Deep convolutional artificial neural networks were initially inspired by brain anatomy, and are the leading models in artificial object recognition. Training these feedforward systems on recognizing objects in ImageNet, a large database of images, has allowed performance of ANNs to vastly improve, but at the same time networks have literally branched out, become increasingly complex with hundreds of layers. In contrast, the visual ventral stream, a series of cortical brain regions that unpack object identity, contains a relatively minuscule four key regions. In addition, ANNs are entirely feedforward, while the primate cortical visual system has densely interconnected wiring, in other words, recurrent connectivity. While primate-like object recognition capabilities can be captured through feedforward-only networks, recurrent wiring in the brain has long been suspected, and recently shown in two DiCarlo lab papers led by Kar and Tang respectively, to be important.

DiCarlo and colleagues have now developed CORnet-S, inspired by very complex, state-of-the-art neural networks. CORnet-S has four computational areas, analogous to cortical visual areas (V1, V2, V4, and IT). In addition, CORnet-S contains repeated, or recurrent, connections.

“We really pre-defined layers in the ANN, defining V1, V2, and so on, and introduced feedback and repeated connections” explains Schrimpf. “As a result, we ended up with fewer layers, and less ‘dead space’ that cannot be mapped to the brain. In short, a simpler network.”

Keeping score

To optimize the system, the researchers incorporated quantitative assessment through a new system, Brain-Score.

“Until now, we’ve needed to qualitatively eyeball model performance relative to the brain,” says Schrimpf. “Brain-Score allows us to actually quantitatively evaluate and benchmark models.”

They found that CORnet-S ranks highly on Brain-Score, and is the best performer of all shallow ANNs. Indeed, the system, shallow as it is, rivals the complex, ultra-deep ANNs that currently perform at the highest level.

CORnet was also benchmarked against human performance. To test, for example, whether the system can predict human behavior, 1,472 humans were shown images for 100ms and then asked to identify objects in them. CORnet-S was able to predict the general accuracy of humans to make calls about what they had briefly glimpsed (bear vs. dog etc.). Indeed, CORnet-S is able to predict the behavior, as well as the neural dynamics, of the visual ventral stream, indicating that it is modeling primate-like behavior.

“We thought we’d lose performance by going to a wide, shallow network, but with recurrence, we hardly lost any,” says Schrimpf, “the message for machine learning more broadly, is you can get away without really deep networks.”

Such models of brain processing have benefits for both neuroscience and artificial systems, helping us to understand the elements of image processing by the brain. Neuroscience in turn informs us that features such as recurrence, can be used to improve performance in shallow networks, an important message for artificial intelligence systems more broadly.

“There are clear advantages to the high performing, complex deep networks,” explains DiCarlo, “but it’s possible to rein the network in, using the elegance of the primate brain as a model, and we think this will ultimately lead to other kinds of advantages.”

Putting vision models to the test

MIT neuroscientists have performed the most rigorous testing yet of computational models that mimic the brain’s visual cortex.

Using their current best model of the brain’s visual neural network, the researchers designed a new way to precisely control individual neurons and populations of neurons in the middle of that network. In an animal study, the team then showed that the information gained from the computational model enabled them to create images that strongly activated specific brain neurons of their choosing.

The findings suggest that the current versions of these models are similar enough to the brain that they could be used to control brain states in animals. The study also helps to establish the usefulness of these vision models, which have generated vigorous debate over whether they accurately mimic how the visual cortex works, says James DiCarlo, the head of MIT’s Department of Brain and Cognitive Sciences, an investigator in the McGovern Institute for Brain Research and the Center for Brains, Minds, and Machines, and the senior author of the study.

“People have questioned whether these models provide understanding of the visual system,” he says. “Rather than debate that in an academic sense, we showed that these models are already powerful enough to enable an important new application. Whether you understand how the model works or not, it’s already useful in that sense.”

MIT postdocs Pouya Bashivan and Kohitij Kar are the lead authors of the paper, which appears in the May 2 online edition of Science.

Neural control

Over the past several years, DiCarlo and others have developed models of the visual system based on artificial neural networks. Each network starts out with an arbitrary architecture consisting of model neurons, or nodes, that can be connected to each other with different strengths, also called weights.

The researchers then train the models on a library of more than 1 million images. As the researchers show the model each image, along with a label for the most prominent object in the image, such as an airplane or a chair, the model learns to recognize objects by changing the strengths of its connections.

It’s difficult to determine exactly how the model achieves this kind of recognition, but DiCarlo and his colleagues have previously shown that the “neurons” within these models produce activity patterns very similar to those seen in the animal visual cortex in response to the same images.

In the new study, the researchers wanted to test whether their models could perform some tasks that previously have not been demonstrated. In particular, they wanted to see if the models could be used to control neural activity in the visual cortex of animals.

“So far, what has been done with these models is predicting what the neural responses would be to other stimuli that they have not seen before,” Bashivan says. “The main difference here is that we are going one step further and using the models to drive the neurons into desired states.”

To achieve this, the researchers first created a one-to-one map of neurons in the brain’s visual area V4 to nodes in the computational model. They did this by showing images to animals and to the models, and comparing their responses to the same images. There are millions of neurons in area V4, but for this study, the researchers created maps for subpopulations of five to 40 neurons at a time.

“Once each neuron has an assignment, the model allows you to make predictions about that neuron,” DiCarlo says.

The researchers then set out to see if they could use those predictions to control the activity of individual neurons in the visual cortex. The first type of control, which they called “stretching,” involves showing an image that will drive the activity of a specific neuron far beyond the activity usually elicited by “natural” images similar to those used to train the neural networks.

The researchers found that when they showed animals these “synthetic” images, which are created by the models and do not resemble natural objects, the target neurons did respond as expected. On average, the neurons showed about 40 percent more activity in response to these images than when they were shown natural images like those used to train the model. This kind of control has never been reported before.

“That they succeeded in doing this is really amazing. It’s as if, for that neuron at least, its ideal image suddenly leaped into focus. The neuron was suddenly presented with the stimulus it had always been searching for,” says Aaron Batista, an associate professor of bioengineering at the University of Pittsburgh, who was not involved in the study. “This is a remarkable idea, and to pull it off is quite a feat. It is perhaps the strongest validation so far of the use of artificial neural networks to understand real neural networks.”

In a similar set of experiments, the researchers attempted to generate images that would drive one neuron maximally while also keeping the activity in nearby neurons very low, a more difficult task. For most of the neurons they tested, the researchers were able to enhance the activity of the target neuron with little increase in the surrounding neurons.

“A common trend in neuroscience is that experimental data collection and computational modeling are executed somewhat independently, resulting in very little model validation, and thus no measurable progress. Our efforts bring back to life this ‘closed loop’ approach, engaging model predictions and neural measurements that are critical to the success of building and testing models that will most resemble the brain,” Kar says.

Measuring accuracy

The researchers also showed that they could use the model to predict how neurons of area V4 would respond to synthetic images. Most previous tests of these models have used the same type of naturalistic images that were used to train the model. The MIT team found that the models were about 54 percent accurate at predicting how the brain would respond to the synthetic images, compared to nearly 90 percent accuracy when the natural images are used.

“In a sense, we’re quantifying how accurate these models are at making predictions outside the domain where they were trained,” Bashivan says. “Ideally the model should be able to predict accurately no matter what the input is.”

The researchers now hope to improve the models’ accuracy by allowing them to incorporate the new information they learn from seeing the synthetic images, which was not done in this study.

This kind of control could be useful for neuroscientists who want to study how different neurons interact with each other, and how they might be connected, the researchers say. Farther in the future, this approach could potentially be useful for treating mood disorders such as depression. The researchers are now working on extending their model to the inferotemporal cortex, which feeds into the amygdala, which is involved in processing emotions.

“If we had a good model of the neurons that are engaged in experiencing emotions or causing various kinds of disorders, then we could use that model to drive the neurons in a way that would help to ameliorate those disorders,” Bashivan says.

The research was funded by the Intelligence Advanced Research Projects Agency, the MIT-IBM Watson AI Lab, the National Eye Institute, and the Office of Naval Research.

Algorithms of intelligence

The following post is adapted from a story featured in a recent Brain Scan newsletter.

Machine vision systems are more and more common in everyday life, from social media to self-driving cars, but training artificial neural networks to “see” the world as we do—distinguishing cyclists from signposts—remains challenging. Will artificial neural networks ever decode the world as exquisitely as humans? Can we refine these models and influence perception in a person’s brain just by activating individual, selected neurons? The DiCarlo lab, including CBMM postdocs Kohitij Kar and Pouya Bashivan, are finding that we are surprisingly close to answering “yes” to such questions, all in the context of accelerated insights into artificial intelligence at the McGovern Institute for Brain Research, CBMM, and the Quest for Intelligence at MIT.

Precision Modeling

Beyond light hitting the retina, the recognition process that unfolds in the visual cortex is key to truly “seeing” the surrounding world. Information is decoded through the ventral visual stream, cortical brain regions that progressively build a more accurate, fine-grained, and accessible representation of the objects around us. Artificial neural networks have been modeled on these elegant cortical systems, and the most successful models, deep convolutional neural networks (DCNNs), can now decode objects at levels comparable to the primate brain. However, even leading DCNNs have problems with certain challenging images, presumably due to shadows, clutter, and other visual noise. While there’s no simple feature that unites all challenging images, the quest is on to tackle such images to attain precise recognition at a level commensurate with human object recognition.

“One next step is to couple this new precision tool with our emerging understanding of how neural patterns underlie object perception. This might allow us to create arrangements of pixels that look nothing like, for example, a cat, but that can fool the brain into thinking it’s seeing a cat.”- James DiCarlo

In a recent push, Kar and DiCarlo demonstrated that adding feedback connections, currently missing in most DCNNs, allows the system to better recognize objects in challenging situations, even those where a human can’t articulate why recognition is an issue for feedforward DCNNs. They also found that this recurrent circuit seems critical to primate success rates in performing this task. This is incredibly important for systems like self-driving cars, where the stakes for artificial visual systems are high, and faithful recognition is a must.

Now you see it

As artificial object recognition systems have become more precise in predicting neural activity, the DiCarlo lab wondered what such precision might allow: could they use their system to not only predict, but to control specific neuronal activity?

To demonstrate the power of their models, Bashivan, Kar, and colleagues zeroed in on targeted neurons in the brain. In a paper published in Science, they used an artificial neural network to generate a random-looking group of pixels that, when shown to an animal, activated the team’s target, a target they called “one hot neuron.” In other words, they showed the brain a synthetic pattern, and the pixels in the pattern precisely activated targeted neurons while other neurons remained relatively silent.

These findings show how the knowledge in today’s artificial neural network models might one day be used to noninvasively influence brain states with neural resolution. Such precise systems would be useful as we look to the future, toward visual prosthetics for the blind. Such a precise model of the ventral visual stream would have been incon-ceivable not so long ago, and all eyes are on where McGovern researchers will take these technologies in the coming years.

Recurrent architecture enhances object recognition in brain and AI

Your ability to recognize objects is remarkable. If you see a cup under unusual lighting or from unexpected directions, there’s a good chance that your brain will still compute that it is a cup. Such precise object recognition is one holy grail for AI developers, such as those improving self-driving car navigation. While modeling primate object recognition in the visual cortex has revolutionized artificial visual recognition systems, current deep learning systems are simplified, and fail to recognize some objects that are child’s play for primates such as humans. In findings published in Nature Neuroscience, McGovern Investigator James DiCarlo and colleagues have found evidence that feedback improves recognition of hard-to-recognize objects in the primate brain, and that adding feedback circuitry also improves the performance of artificial neural network systems used for vision applications.

Deep convolutional neural networks (DCNN) are currently the most successful models for accurately recognizing objects on a fast timescale (<100 ms) and have a general architecture inspired by the primate ventral visual stream, cortical regions that progressively build an accessible and refined representation of viewed objects. Most DCNNs are simple in comparison to the primate ventral stream however.

“For a long period of time, we were far from an model-based understanding. Thus our field got started on this quest by modeling visual recognition as a feedforward process,” explains senior author DiCarlo, who is also the head of MIT’s Department of Brain and Cognitive Sciences and Research Co-Leader in the Center for Brains, Minds, and Machines (CBMM). “However, we know there are recurrent anatomical connections in brain regions linked to object recognition.”

Think of feedforward DCNNs and the portion of the visual system that first attempts to capture objects as a subway line that runs forward through a series of stations. The extra, recurrent brain networks are instead like the streets above, interconnected and not unidirectional. Because it only takes about 200 ms for the brain to recognize an object quite accurately, it was unclear if these recurrent interconnections in the brain had any role at all in core object recognition. For example, perhaps those recurrent connections are only in place to keep the visual system in tune over long periods of time. For example, the return gutters of the streets help slowly clear it of water and trash, but are not strictly needed to quickly move people from one end of town to the other. DiCarlo, along with lead author and CBMM postdoc Kohitij Kar, set out to test whether a subtle role of recurrent operations in rapid visual object recognition was being overlooked.

Challenging recognition

The authors first needed to identify objects that are trivially decoded by the primate brain, but are challenging for artificial systems. Rather than trying to guess why deep learning was having problems recognizing an object (is it due to clutter in the image? a misleading shadow?), the authors took an unbiased approach that turned out to be critical.

Kar explained further that “we realized that AI-models actually don’t have problems with every image where an object is occluded or in clutter. Humans trying to guess why AI models were challenged turned out to be holding us back.”

Instead, the authors presented the deep learning system, as well as monkeys and humans, with images, homing in on “challenge images” where the primates could easily recognize the objects in those images, but a feed forward DCNN ran into problems. When they, and others, added appropriate recurrent processing to these DCNNs, object recognition in challenge images suddenly became a breeze.

Processing times

Kar used neural recording methods with very high spatial and temporal precision to whether these images were really so trivial for primates. Remarkably, they found that though challenge images had initially appeared to be child’s play to the human brain, they actually involve extra neural processing time (about additional 30 milliseconds), suggesting that recurrent loops operate in our brain too.

 “What the computer vision community has recently achieved by stacking more and more layers onto artificial neural networks, evolution has achieved through a brain architecture with recurrent connections.” — Kohitij Kar

Diane Beck, Professor of Psychology and Co-chair of the Intelligent Systems Theme at the Beckman Institute and not an author on the study, explained further. “Since entirely feed forward deep convolutional nets are now remarkably good at predicting primate brain activity, it raised questions about the role of feedback connections in the primate brain. This study shows that, yes, feedback connections are very likely playing a role in object recognition after all.”

What does this mean for a self-driving car? It shows that deep learning architectures involved in object recognition need recurrent components if they are to match the primate brain, and also indicates how to operationalize this procedure for the next generation of intelligent machines.

“Recurrent models offer predictions of neural activity and behavior over time,” says Kar. “We may now be able to model more involved tasks. Perhaps one day, the systems will not only recognize an object, such as a person, but also perform cognitive tasks that the human brain so easily manages, such as understanding the emotions of other people.”

This work was supported by the Office of Naval Research grant MURI-114407 (J.J.D.). Center for Brains, Minds, and Machines (CBMM) funded by NSF STC award CCF-1231216 (K.K.).

Elephant or chair? How the brain IDs objects

As visual information flows into the brain through the retina, the visual cortex transforms the sensory input into coherent perceptions. Neuroscientists have long hypothesized that a part of the visual cortex called the inferotemporal (IT) cortex is necessary for the key task of recognizing individual objects, but the evidence has been inconclusive.

In a new study, MIT neuroscientists have found clear evidence that the IT cortex is indeed required for object recognition; they also found that subsets of this region are responsible for distinguishing different objects.

In addition, the researchers have developed computational models that describe how these neurons transform visual input into a mental representation of an object. They hope such models will eventually help guide the development of brain-machine interfaces (BMIs) that could be used for applications such as generating images in the mind of a blind person.

“We don’t know if that will be possible yet, but this is a step on the pathway toward those kinds of applications that we’re thinking about,” says James DiCarlo, the head of MIT’s Department of Brain and Cognitive Sciences, a member of the McGovern Institute for Brain Research, and the senior author of the new study.

Rishi Rajalingham, a postdoc at the McGovern Institute, is the lead author of the paper, which appears in the March 13 issue of Neuron.

Distinguishing objects

In addition to its hypothesized role in object recognition, the IT cortex also contains “patches” of neurons that respond preferentially to faces. Beginning in the 1960s, neuroscientists discovered that damage to the IT cortex could produce impairments in recognizing non-face objects, but it has been difficult to determine precisely how important the IT cortex is for this task.

The MIT team set out to find more definitive evidence for the IT cortex’s role in object recognition, by selectively shutting off neural activity in very small areas of the cortex and then measuring how the disruption affected an object discrimination task. In animals that had been trained to distinguish between objects such as elephants, bears, and chairs, they used a drug called muscimol to temporarily turn off subregions about 2 millimeters in diameter. Each of these subregions represents about 5 percent of the entire IT cortex.

These experiments, which represent the first time that researchers have been able to silence such small regions of IT cortex while measuring behavior over many object discriminations, revealed that the IT cortex is not only necessary for distinguishing between objects, but it is also divided into areas that handle different elements of object recognition.

The researchers found that silencing each of these tiny patches produced distinctive impairments in the animals’ ability to distinguish between certain objects. For example, one subregion might be involved in distinguishing chairs from cars, but not chairs from dogs. Each region was involved in 25 to 30 percent of the tasks that the researchers tested, and regions that were closer to each other tended to have more overlap between their functions, while regions far away from each other had little overlap.

“We might have thought of it as a sea of neurons that are completely mixed together, except for these islands of “face patches.” But what we’re finding, which many other studies had pointed to, is that there is large-scale organization over the entire region,” Rajalingham says.

The features that each of these regions are responding to are difficult to classify, the researchers say. The regions are not specific to objects such as dogs, nor easy-to-describe visual features such as curved lines.

“It would be incorrect to say that because we observed a deficit in distinguishing cars when a certain neuron was inhibited, this is a ‘car neuron,’” Rajalingham says. “Instead, the cell is responding to a feature that we can’t explain that is useful for car discriminations. There has been work in this lab and others that suggests that the neurons are responding to complicated nonlinear features of the input image. You can’t say it’s a curve, or a straight line, or a face, but it’s a visual feature that is especially helpful in supporting that particular task.”

Bevil Conway, a principal investigator at the National Eye Institute, says the new study makes significant progress toward answering the critical question of how neural activity in the IT cortex produces behavior.

“The paper makes a major step in advancing our understanding of this connection, by showing that blocking activity in different small local regions of IT has a different selective deficit on visual discrimination. This work advances our knowledge not only of the causal link between neural activity and behavior but also of the functional organization of IT: How this bit of brain is laid out,” says Conway, who was not involved in the research.

Brain-machine interface

The experimental results were consistent with computational models that DiCarlo, Rajalingham, and others in their lab have created to try to explain how IT cortex neuron activity produces specific behaviors.

“That is interesting not only because it says the models are good, but because it implies that we could intervene with these neurons and turn them on and off,” DiCarlo says. “With better tools, we could have very large perceptual effects and do real BMI in this space.”

The researchers plan to continue refining their models, incorporating new experimental data from even smaller populations of neurons, in hopes of developing ways to generate visual perception in a person’s brain by activating a specific sequence of neuronal activity. Technology to deliver this kind of input to a person’s brain could lead to new strategies to help blind people see certain objects.

“This is a step in that direction,” DiCarlo says. “It’s still a dream, but that dream someday will be supported by the models that are built up by this kind of work.”

The research was funded by the National Eye Institute, the Office of Naval Research, and the Simons Foundation.

James DiCarlo

Rapid Recognition

DiCarlo’s research goal is to reverse engineer the brain mechanisms that underlie human visual intelligence. He and his collaborators have revealed how population image transformations carried out by a deep stack of interconnected neocortical brain areas — called the primate ventral visual stream — are effortlessly able to extract object identity from visual images. His team uses a combination of large-scale neurophysiology, brain imaging, direct neural perturbation methods, and machine learning methods to build and test neurally-mechanistic computational models of the ventral visual stream and its support of cognition and behavior. Such an engineering-based understanding is likely to lead to new artificial vision and artificial intelligence approaches, new brain-machine interfaces to restore or augment lost senses, and a new foundation to ameliorate disorders of the mind.