What powerful new bots like ChatGPT tell us about intelligence and the human brain

This story originally appeared in the Spring 2023 issue of BrainScan.

___

Artificial intelligence seems to have gotten a lot smarter recently. AI technologies are increasingly integrated into our lives — improving our weather forecasts, finding efficient routes through traffic, personalizing the ads we see and our experiences with social media.

Watercolor image of a robot with a human brain, created using the AI system DALL*E2.

But with the debut of powerful new chatbots like ChatGPT, millions of people have begun interacting with AI tools that seem convincingly human-like. Neuroscientists are taking note — and beginning to dig into what these tools tell us about intelligence and the human brain.

The essence of human intelligence is hard to pin down, let alone engineer. McGovern scientists say there are many kinds of intelligence, and as humans, we call on many different kinds of knowledge and ways of thinking. ChatGPT’s ability to carry on natural conversations with its users has led some to speculate the computer model is sentient, but McGovern neuroscientists insist that the AI technology cannot think for itself.

Still, they say, the field may have reached a turning point.

“I still don’t believe that we can make something that is indistinguishable from a human. I think we’re a long way from that. But for the first time in my life I think there is a small, nonzero chance that it may happen in the next year,” says McGovern founding member Tomaso Poggio, who has studied both human intelligence and machine learning for more than 40 years.

Different sort of intelligence

Developed by the company OpenAI, ChatGPT is an example of a deep neural network, a type of machine learning system that has made its way into virtually every aspect of science and technology. These models learn to perform various tasks by identifying patterns in large datasets. ChatGPT works by scouring texts and detecting and replicating the ways language is used. Drawing on language patterns it finds across the internet, ChatGPT can design you a meal plan, teach you about rocket science, or write a high school-level essay about Mark Twain. With all of the internet as a training tool, models like this have gotten so good at what they do, they can seem all-knowing.

“Engineers have been inventing some of these forms of intelligence since the beginning of the computers. ChatGPT is one. But it is very far from human intelligence.” – Tomaso Poggio

Nonetheless, language models have a restricted skill set. Play with ChatGPT long enough and it will surely give you some wrong information, even if its fluency makes its words deceptively convincing. “These models don’t know about the world, they don’t know about other people’s mental states, they don’t know how things are beyond whatever they can gather from how words go together,” says Postdoctoral Associate Anna Ivanova, who works with McGovern Investigators Evelina Fedorenko and Nancy Kanwisher as well as Jacob Andreas in MIT’s Computer Science and Artificial Intelligence Laboratory.

Such a model, the researchers say, cannot replicate the complex information processing that happens in the human brain. That doesn’t mean language models can’t be intelligent — but theirs is a different sort of intelligence than our own. “I think that there is an infinite number of different forms of intelligence,” says Poggio. “Engineers have been inventing some of these forms of intelligence since the beginning of the computers. ChatGPT is one. But it is very far from human intelligence.”

Under the hood

Just as there are many forms of intelligence, there are also many types of deep learning models — and McGovern researchers are studying the internals of these models to better understand the human brain.

A watercolor painting of a robot generated by DALL*E2.

“These AI models are, in a way, computational hypotheses for what the brain is doing,” Kanwisher says. “Up until a few years ago, we didn’t really have complete computational models of what might be going on in language processing or vision. Once you have a way of generating actual precise models and testing them against real data, you’re kind of off and running in a way that we weren’t ten years ago.”

Artificial neural networks echo the design of the brain in that they are made of densely interconnected networks of simple units that organize themselves — but Poggio says it’s not yet entirely clear how they work.

No one expects that brains and machines will work in exactly the same ways, though some types of deep learning models are more humanlike in their internals than others. For example, a computer vision model developed by McGovern Investigator James DiCarlo responds to images in ways that closely parallel the activity in the visual cortex of animals who are seeing the same thing. DiCarlo’s team can even use their model’s predictions to create an image that will activate specific neurons in an animal’s brain.

“We shouldn’t just automatically assume that if we trained a deep network on a task, that it’s going to look like the brain.” – Ila Fiete

Still, there is reason to be cautious in interpreting what artificial neural networks tell us about biology. “We shouldn’t just automatically assume that if we trained a deep network on a task, that it’s going to look like the brain,” says McGovern Associate Investigator Ila Fiete. Fiete acknowledges that it’s tempting to think of neural networks as models of the brain itself due to their architectural similarities — but she says so far, that idea remains largely untested.

McGovern Institute Associate Investigator Ila Fiete builds theoretical models of the brain. Photo: Caitlin Cunningham

She and her colleagues recently experimented with neural networks that estimate an object’s position in space by integrating information about its changing velocity.

In the brain, specialized neurons known as grid cells carry out this calculation, keeping us aware of where we are as we move through the world. Other researchers had reported that not only can neural networks do this successfully, those that do include components that behave remarkably like grid cells. They had argued that the need to do this kind of path integration must be the reason our brains have grid cells — but Fiete’s team found that artificial networks don’t need to mimic the brain to accomplish this brain-like task. They found that many neural networks can solve the same problem without grid cell-like elements.

One way investigators might generate deep learning models that do work like the brain is to give them a problem that is so complex that there is only one way of solving it, Fiete says.

Language, she acknowledges, might be that complex.

“This is clearly an example of a super-rich task,” she says. “I think on that front, there is a hope that they’re solving such an incredibly difficult task that maybe there is a sense in which they mirror the brain.”

Language parallels

In Fedorenko’s lab, where researchers are focused on identifying and understanding the brain’s language processing circuitry, they have found that some language models do, in fact, mimic certain aspects of human language processing. Many of the most effective models are trained to do a single task: make predictions about word use. That’s what your phone is doing when it suggests words for your text message as you type. Models that are good at this, it turns out, can apply this skill to carrying on conversations, composing essays, and using language in other useful ways. Neuroscientists have found evidence that humans, too, rely on word prediction as a part of language processing.

Fedorenko and her team compared the activity of language models to the brain activity of people as they read or listened to words, sentences, and stories, and found that some models were a better match to human neural responses than others. “The models that do better on this relatively unsophisticated task — just guess what comes next — also do better at capturing human neural responses,” Fedorenko says.

A watercolor painting of a language model, generated by DALL*E2.

It’s a compelling parallel, suggesting computational models and the human brain may have arrived at a similar solution to a problem, even in the face of the biological constraints that have shaped the latter. For Fedorenko and her team, it’s sparked new ideas that they will explore, in part, by modifying existing language models — possibly to more closely mimic the brain.

With so much still unknown about how both human and artificial neural networks learn, Fedorenko says it’s hard to predict what it will take to make language models work and behave more like the human brain. One possibility they are exploring is training a model in a way that more closely mirrors the way children learn language early in life.

Another question, she says, is whether language models might behave more like humans if they had a more limited recall of their own conversations. “All of the state-of-the-art language models keep track of really, really long linguistic contexts. Humans don’t do that,” she says.

Chatbots can retain long strings of dialogue, using those words to tailor their responses as a conversation progresses, she explains. Humans, on the other hand, must cope with a more limited memory. While we can keep track of information as it is conveyed, we only store a string of about eight words as we listen or read. “We get linguistic input, we crunch it up, we extract some kind of meaning representation, presumably in some more abstract format, and then we discard the exact linguistic stream because we don’t need it anymore,” Fedorenko explains.

Language models aren’t able to fill in gaps in conversation with their own knowledge and awareness in the same way a person can, Ivanova adds. “That’s why so far they have to keep track of every single input word,” she says. “If we want a model that models specifically the [human] language network, we don’t need to have this large context window. It would be very cool to train those models on those short windows of context and see if it’s more similar to the language network.”

Multimodal intelligence

Despite these parallels, Fedorenko’s lab has also shown that there are plenty of things language circuits do not do. The brain calls on other circuits to solve math problems, write computer code, and carry out myriad other cognitive processes. Their work makes it clear that in the brain, language and thought are not the same.

That’s borne out by what cognitive neuroscientists like Kanwisher have learned about the functional organization of the human brain, where circuit components are dedicated to surprisingly specific tasks, from language processing to face recognition.

“The upshot of cognitive neuroscience over the last 25 years is that the human brain really has quite a degree of modular organization,” Kanwisher says. “You can look at the brain and say, ‘what does it tell us about the nature of intelligence?’ Well, intelligence is made up of a whole bunch of things.”

In generating this image from the text prompt, “a watercolor painting of a woman looking in a mirror and seeing a robot,” DALL*E2 incorrectly placed the woman (not the robot) in the mirror, highlighting one of the weaknesses of current deep learning models.

In January, Fedorenko, Kanwisher, Ivanova, and colleagues shared an extensive analysis of the capabilities of large language models. After assessing models’ performance on various language-related tasks, they found that despite their mastery of linguistic rules and patterns, such models don’t do a good job using language in real-world situations. From a neuroscience perspective, that kind of functional competence is distinct from formal language competence, calling on not just language-processing circuits but also parts of the brain that store knowledge of the world, reason, and interpret social interactions.

Language is a powerful tool for understanding the world, they say, but it has limits.

“If you train on language prediction alone, you can learn to mimic certain aspects of thinking,” Ivanova says. “But it’s not enough. You need a multimodal system to carry out truly intelligent behavior.”

The team concluded that while AI language models do a very good job using language, they are incomplete models of human thought. For machines to truly think like humans, Ivanova says, they will need a combination of different neural nets all working together, in the same way different networks in the human brain work together to achieve complex cognitive tasks in the real world.

It remains to be seen whether such models would excel in the tech world, but they could prove valuable for revealing insights into human cognition — perhaps in ways that will inform engineers as they strive to build systems that better replicate human intelligence.

These neurons have food on the brain

A gooey slice of pizza. A pile of crispy French fries. Ice cream dripping down a cone on a hot summer day. When you look at any of these foods, a specialized part of your visual cortex lights up, according to a new study from MIT neuroscientists.

This newly discovered population of food-responsive neurons is located in the ventral visual stream, alongside populations that respond specifically to faces, bodies, places, and words. The unexpected finding may reflect the special significance of food in human culture, the researchers say.

“Food is central to human social interactions and cultural practices. It’s not just sustenance,” says Nancy Kanwisher, the Walter A. Rosenblith Professor of Cognitive Neuroscience and a member of MIT’s McGovern Institute for Brain Research and Center for Brains, Minds, and Machines. “Food is core to so many elements of our cultural identity, religious practice, and social interactions, and many other things that humans do.”

The findings, based on an analysis of a large public database of human brain responses to a set of 10,000 images, raise many additional questions about how and why this neural population develops. In future studies, the researchers hope to explore how people’s responses to certain foods might differ depending on their likes and dislikes, or their familiarity with certain types of food.

MIT postdoc Meenakshi Khosla is the lead author of the paper, along with MIT research scientist N. Apurva Ratan Murty. The study appears today in the journal Current Biology.

Visual categories

More than 20 years ago, while studying the ventral visual stream, the part of the brain that recognizes objects, Kanwisher discovered cortical regions that respond selectively to faces. Later, she and other scientists discovered other regions that respond selectively to places, bodies, or words. Most of those areas were discovered when researchers specifically set out to look for them. However, that hypothesis-driven approach can limit what you end up finding, Kanwisher says.

“There could be other things that we might not think to look for,” she says. “And even when we find something, how do we know that that’s actually part of the basic dominant structure of that pathway, and not something we found just because we were looking for it?”

To try to uncover the fundamental structure of the ventral visual stream, Kanwisher and Khosla decided to analyze a large, publicly available dataset of full-brain functional magnetic resonance imaging (fMRI) responses from eight human subjects as they viewed thousands of images.

“We wanted to see when we apply a data-driven, hypothesis-free strategy, what kinds of selectivities pop up, and whether those are consistent with what had been discovered before. A second goal was to see if we could discover novel selectivities that either haven’t been hypothesized before, or that have remained hidden due to the lower spatial resolution of fMRI data,” Khosla says.

To do that, the researchers applied a mathematical method that allows them to discover neural populations that can’t be identified from traditional fMRI data. An fMRI image is made up of many voxels — three-dimensional units that represent a cube of brain tissue. Each voxel contains hundreds of thousands of neurons, and if some of those neurons belong to smaller populations that respond to one type of visual input, their responses may be drowned out by other populations within the same voxel.

The new analytical method, which Kanwisher’s lab has previously used on fMRI data from the auditory cortex, can tease out responses of neural populations within each voxel of fMRI data.

Using this approach, the researchers found four populations that corresponded to previously identified clusters that respond to faces, places, bodies, and words. “That tells us that this method works, and it tells us that the things that we found before are not just obscure properties of that pathway, but major, dominant properties,” Kanwisher says.

Intriguingly, a fifth population also emerged, and this one appeared to be selective for images of food.

“We were first quite puzzled by this because food is not a visually homogenous category,” Khosla says. “Things like apples and corn and pasta all look so unlike each other, yet we found a single population that responds similarly to all these diverse food items.”

The food-specific population, which the researchers call the ventral food component (VFC), appears to be spread across two clusters of neurons, located on either side of the FFA. The fact that the food-specific populations are spread out between other category-specific populations may help explain why they have not been seen before, the researchers say.

“We think that food selectivity had been harder to characterize before because the populations that are selective for food are intermingled with other nearby populations that have distinct responses to other stimulus attributes. The low spatial resolution of fMRI prevents us from seeing this selectivity because the responses of different neural population get mixed in a voxel,” Khosla says.

“The technique which the researchers used to identify category-sensitive cells or areas is impressive, and it recovered known category-sensitive systems, making the food category findings most impressive,” says Paul Rozin, a professor of psychology at the University of Pennsylvania, who was not involved in the study. “I can’t imagine a way for the brain to reliably identify the diversity of foods based on sensory features. That makes this all the more fascinating, and likely to clue us in about something really new.”

Food vs non-food

The researchers also used the data to train a computational model of the VFC, based on previous models Murty had developed for the brain’s face and place recognition areas. This allowed the researchers to run additional experiments and predict the responses of the VFC. In one experiment, they fed the model matched images of food and non-food items that looked very similar — for example, a banana and a yellow crescent moon.

“Those matched stimuli have very similar visual properties, but the main attribute in which they differ is edible versus inedible,” Khosla says. “We could feed those arbitrary stimuli through the predictive model and see whether it would still respond more to food than non-food, without having to collect the fMRI data.”

They could also use the computational model to analyze much larger datasets, consisting of millions of images. Those simulations helped to confirm that the VFC is highly selective for images of food.

From their analysis of the human fMRI data, the researchers found that in some subjects, the VFC responded slightly more to processed foods such as pizza than unprocessed foods like apples. In the future they hope to explore how factors such as familiarity and like or dislike of a particular food might affect individuals’ responses to that food.

They also hope to study when and how this region becomes specialized during early childhood, and what other parts of the brain it communicates with. Another question is whether this food-selective population will be seen in other animals such as monkeys, who do not attach the cultural significance to food that humans do.

The research was funded by the National Institutes of Health, the National Eye Institute, and the National Science Foundation through the MIT Center for Brains, Minds, and Machines.

Unexpected synergy

This story originally appeared in the Spring 2022 issue of BrainScan.

***

Recent results from cognitive neuroscientist Nancy Kanwisher’s lab have left her pondering the role of music in human evolution. “Music is this big mystery,” she says. “Every human society that’s been studied has music. No other animals have music in the way that humans do. And nobody knows why humans have music at all. This has been a puzzle for centuries.”

MIT neuroscientist and McGovern Investigator Nancy Kanwisher. Photo: Jussi Puikkonen/KNAW

Some biologists and anthropologists have reasoned that since there’s no clear evolutionary advantage for humans’ unique ability to create and respond to music, these abilities must have emerged when humans began to repurpose other brain functions. To appreciate song, they’ve proposed, we draw on parts of the brain dedicated to speech and language. It makes sense, Kanwisher says: music and language are both complex, uniquely human ways of communicating. “It’s very sensible to think that there might be common machinery,” she says. “But there isn’t.”

That conclusion is based on her team’s 2015 discovery of neurons in the human brain that respond only to music. They first became clued in to these music-sensitive cells when they asked volunteers to listen to a diverse panel of sounds inside an MRI scanner. Functional brain imaging picked up signals suggesting that some neurons were specialized to detect only music but the broad map of brain activity generated by an fMRI couldn’t pinpoint those cells.

Singing in the brain

Kanwisher’s team wanted to know more but neuroscientists who study the human brain can’t always probe its circuitry with the exactitude of their colleagues who study the brains of mice or rats. They can’t insert electrodes into human brains to monitor the neurons they’re interested in. Neurosurgeons, however, sometimes do — and thus, collaborating with neurosurgeons has created unique opportunities for Kanwisher and other McGovern investigators to learn about the human brain.

Kanwisher’s team collaborated with clinicians at Albany Medical Center to work with patients who are undergoing monitoring prior to surgical treatment for epilepsy. Before operating, a neurosurgeon must identify the spot in their patient’s brain that is triggering seizures. This means inserting electrodes into the brain to monitor specific areas over a few days or weeks. The electrodes they implant pinpoint activity far more precisely, both spatially and temporally, than an MRI. And with patients’ permission, researchers like Kanwisher can take advantage of the information they collect.

“The intracranial recording from human brains that’s possible from collaboration with neurosurgeons is extremely precious to us,” Kanwisher says. “All of the research is kind of opportunistic, on whatever the surgeons are doing for clinical reasons. But sometimes we get really lucky and the electrodes are right in an area where we have long-standing scientific questions that those data can answer.”

Song-selective neural population (yellow) in the “inflated” human brain. Image: Sam Norman-Haignere

The unexpected discovery of song-specific neurons, led by postdoctoral researcher Sam Norman-Haignere, who is now an assistant professor at the University of Rochester Medical Center, emerged from such a collaboration. The team worked with patients at Albany Medical Center whose presurgical monitoring encompassed the auditory-processing part of the brain that they were curious about. Sure enough, certain electrodes picked up activity only when patients were listening to music. The data indicated that in some of those locations, it didn’t matter what kind of music was playing: the cells fired in response to a range of sounds that included flute solos, heavy metal, and rap. But other locations became active exclusively in response to vocal music. “We did not have that hypothesis at all, Kanwisher says. “It reallytook our breath away,” she says.

When that discovery is considered along with findings from McGovern colleague Ev Fedorenko, who has shown that the brain’s language-processing regions do not respond to music, Kanwisher says it’s now clear that music and language are segregated in the human brain. The origins of our unique appreciation for music, however, remain a mystery.

Clinical advantage

Clinical collaborations are also important to researchers in Ann Graybiels lab, who rely largely on model organisms like mice and rats to investigate the fine details of neural circuits. Working with clinicians helps keep them focused on answering questions that matter to patients.

In studying how the brain makes decisions, the Graybiel lab has zeroed in on connections that are vital for making choices that carry both positive and negative consequences. This is the kind of decision-making that you might call on when considering whether to accept a job that pays more but will be more demanding than your current position, for example. In experiments with rats, mice, and monkeys, they’ve identified different neurons dedicated to triggering opposing actions “approach” or “avoid” in these complex decision-making tasks. They’ve also found evidence that both age and stress change how the brain deals with these kinds of decisions.

In work led by former Graybiel lab research scientist Ken-ichi Amemori, they have worked with psychiatrist Diego Pizzagalli at McLean Hospital to learn what happens in the human brain when people make these complex decisions.

By monitoring brain activity as people made decisions inside an MRI scanner, the team identified regions that lit up when people chose to “approach” or “avoid.” They also found parallel activity patterns in monkeys that performed the same task, supporting the relevance of animal studies to understanding this circuitry.

In people diagnosed with major depression, however, the brain responded to approach-avoidance conflict somewhat differently. Certain areas were not activated as strongly as they were in people without depression, regardless of whether subjects ultimately chose to “approach” or “avoid.” The team suspects that some of these differences might reflect a stronger tendency toward avoidance, in which potential rewards are less influential for decision-making, while an individual is experiencing major depression.

The brain activity associated with approach-avoidance conflict in humans appears to align with what Graybiel’s team has seen in mice, although clinical imaging cannot reveal nearly as much detail about the involved circuits. Graybiel says that gives her confidence that what they are learning in the lab, where they can manipulate and study neural circuits with precision, is important. “I think there’s no doubt that this is relevant to humans,” she says. “I want to get as far into the mechanisms as possible, because maybe we’ll hit something that’s therapeutically valuable, or maybe we will really get an intuition about how parts of the brain work. I think that will help people.”

An optimized solution for face recognition

The human brain seems to care a lot about faces. It’s dedicated a specific area to identifying them, and the neurons there are so good at their job that most of us can readily recognize thousands of individuals. With artificial intelligence, computers can now recognize faces with a similar efficiency—and neuroscientists at MIT’s McGovern Institute have found that a computational network trained to identify faces and other objects discovers a surprisingly brain-like strategy to sort them all out.

The finding, reported March 16, 2022, in Science Advances, suggests that the millions of years of evolution that have shaped circuits in the human brain have optimized our system for facial recognition.

“The human brain’s solution is to segregate the processing of faces from the processing of objects,” explains Katharina Dobs, who led the study as a postdoctoral researcher in McGovern investigator Nancy Kanwisher’s lab. The artificial network that she trained did the same. “And that’s the same solution that we hypothesize any system that’s trained to recognize faces and to categorize objects would find,” she adds.

“These two completely different systems have figured out what a—if not the—good solution is. And that feels very profound,” says Kanwisher.

Functionally specific brain regions

More than twenty years ago, Kanwisher’s team discovered a small spot in the brain’s temporal lobe that responds specifically to faces. This region, which they named the fusiform face area, is one of many brain regions Kanwisher and others have found that are dedicated to specific tasks, such as the detection of written words, the perception of vocal songs, and understanding language.

Kanwisher says that as she has explored how the human brain is organized, she has always been curious about the reasons for that organization. Does the brain really need special machinery for facial recognition and other functions? “‘Why questions’ are very difficult in science,” she says. But with a sophisticated type of machine learning called a deep neural network, her team could at least find out how a different system would handle a similar task.

Dobs, who is now a research group leader at Justus Liebig University Giessen in Germany, assembled hundreds of thousands of images with which to train a deep neural network in face and object recognition. The collection included the faces of more than 1,700 different people and hundreds of different kinds of objects, from chairs to cheeseburgers. All of these were presented to the network, with no clues about which was which. “We never told the system that some of those are faces, and some of those are objects. So it’s basically just one big task,” Dobs says. “It needs to recognize a face identity, as well as a bike or a pen.”

Visualization of the preferred stimulus for example face-ranked filters. While filters in early layers (e.g., Conv5) were maximally activated by simple features, filters responded to features that appear somewhat like face parts (e.g., nose and eyes) in mid-level layers (e.g., Conv9) and appear to represent faces in a more holistic manner in late convolutional layers. Image: Kanwisher lab

As the program learned to identify the objects and faces, it organized itself into an information-processing network with that included units specifically dedicated to face recognition. Like the brain, this specialization occurred during the later stages of image processing. In both the brain and the artificial network, early steps in facial recognition involve more general vision processing machinery, and final stages rely on face-dedicated components.

It’s not known how face-processing machinery arises in a developing brain, but based on their findings, Kanwisher and Dobs say networks don’t necessarily require an innate face-processing mechanism to acquire that specialization. “We didn’t build anything face-ish into our network,” Kanwisher says. “The networks managed to segregate themselves without being given a face-specific nudge.”

Kanwisher says it was thrilling seeing the deep neural network segregate itself into separate parts for face and object recognition. “That’s what we’ve been looking at in the brain for twenty-some years,” she says. “Why do we have a separate system for face recognition in the brain? This tells me it is because that is what an optimized solution looks like.”

Now, she is eager to use deep neural nets to ask similar questions about why other brain functions are organized the way they are. “We have a new way to ask why the brain is organized the way it is,” she says. “How much of the structure we see in human brains will arise spontaneously by training networks to do comparable tasks?”

Singing in the brain

Press Mentions

For the first time, MIT neuroscientists have identified a population of neurons in the human brain that lights up when we hear singing, but not other types of music.

These neurons, found in the auditory cortex, appear to respond to the specific combination of voice and music, but not to either regular speech or instrumental music. Exactly what they are doing is unknown and will require more work to uncover, the researchers say.

“The work provides evidence for relatively fine-grained segregation of function within the auditory cortex, in a way that aligns with an intuitive distinction within music,” says Sam Norman-Haignere, a former MIT postdoc who is now an assistant professor of neuroscience at the University of Rochester Medical Center.

The work builds on a 2015 study in which the same research team used functional magnetic resonance imaging (fMRI) to identify a population of neurons in the brain’s auditory cortex that responds specifically to music. In the new work, the researchers used recordings of electrical activity taken at the surface of the brain, which gave them much more precise information than fMRI.

“There’s one population of neurons that responds to singing, and then very nearby is another population of neurons that responds broadly to lots of music. At the scale of fMRI, they’re so close that you can’t disentangle them, but with intracranial recordings, we get additional resolution, and that’s what we believe allowed us to pick them apart,” says Norman-Haignere.

Norman-Haignere is the lead author of the study, which appears today in the journal Current Biology. Josh McDermott, an associate professor of brain and cognitive sciences, and Nancy Kanwisher, the Walter A. Rosenblith Professor of Cognitive Neuroscience, both members of MIT’s McGovern Institute for Brain Research and Center for Brains, Minds and Machines (CBMM), are the senior authors of the study.

Neural recordings

In their 2015 study, the researchers used fMRI to scan the brains of participants as they listened to a collection of 165 sounds, including different types of speech and music, as well as everyday sounds such as finger tapping or a dog barking. For that study, the researchers devised a novel method of analyzing the fMRI data, which allowed them to identify six neural populations with different response patterns, including the music-selective population and another population that responds selectively to speech.

In the new study, the researchers hoped to obtain higher-resolution data using a technique known as electrocorticography (ECoG), which allows electrical activity to be recorded by electrodes placed inside the skull. This offers a much more precise picture of electrical activity in the brain compared to fMRI, which measures blood flow in the brain as a proxy of neuron activity.

“With most of the methods in human cognitive neuroscience, you can’t see the neural representations,” Kanwisher says. “Most of the kind of data we can collect can tell us that here’s a piece of brain that does something, but that’s pretty limited. We want to know what’s represented in there.”

Electrocorticography cannot be typically be performed in humans because it is an invasive procedure, but it is often used to monitor patients with epilepsy who are about to undergo surgery to treat their seizures. Patients are monitored over several days so that doctors can determine where their seizures are originating before operating. During that time, if patients agree, they can participate in studies that involve measuring their brain activity while performing certain tasks. For this study, the MIT team was able to gather data from 15 participants over several years.

For those participants, the researchers played the same set of 165 sounds that they used in the earlier fMRI study. The location of each patient’s electrodes was determined by their surgeons, so some did not pick up any responses to auditory input, but many did. Using a novel statistical analysis that they developed, the researchers were able to infer the types of neural populations that produced the data that were recorded by each electrode.

“When we applied this method to this data set, this neural response pattern popped out that only responded to singing,” Norman-Haignere says. “This was a finding we really didn’t expect, so it very much justifies the whole point of the approach, which is to reveal potentially novel things you might not think to look for.”

That song-specific population of neurons had very weak responses to either speech or instrumental music, and therefore is distinct from the music- and speech-selective populations identified in their 2015 study.

Music in the brain

In the second part of their study, the researchers devised a mathematical method to combine the data from the intracranial recordings with the fMRI data from their 2015 study. Because fMRI can cover a much larger portion of the brain, this allowed them to determine more precisely the locations of the neural populations that respond to singing.

“This way of combining ECoG and fMRI is a significant methodological advance,” McDermott says. “A lot of people have been doing ECoG over the past 10 or 15 years, but it’s always been limited by this issue of the sparsity of the recordings. Sam is really the first person who figured out how to combine the improved resolution of the electrode recordings with fMRI data to get better localization of the overall responses.”

The song-specific hotspot that they found is located at the top of the temporal lobe, near regions that are selective for language and music. That location suggests that the song-specific population may be responding to features such as the perceived pitch, or the interaction between words and perceived pitch, before sending information to other parts of the brain for further processing, the researchers say.

The researchers now hope to learn more about what aspects of singing drive the responses of these neurons. They are also working with MIT Professor Rebecca Saxe’s lab to study whether infants have music-selective areas, in hopes of learning more about when and how these brain regions develop.

The research was funded by the National Institutes of Health, the U.S. Army Research Office, the National Science Foundation, the NSF Science and Technology Center for Brains, Minds, and Machines, the Fondazione Neurone, the Howard Hughes Medical Institute, and the Kristin R. Pressman and Jessica J. Pourian ’13 Fund at MIT.

National Academy of Sciences honors cognitive neuroscientist Nancy Kanwisher

MIT neuroscientist and McGovern Investigator Nancy Kanwisher. Photo: Jussi Puikkonen/KNAW

The National Academy of Sciences (NAS) has announced today that Nancy Kanwisher, the Walter A. Rosenblith Professor of Cognitive Neuroscience in MIT’s Department of Brain and Cognitive Sciences, has received the 2022 NAS Award in the Neurosciences for her “pioneering research into the functional organization of the human brain.” The $25,000 prize, established by the Fidia Research Foundation, is presented every three years to recognize “extraordinary contributions to the neuroscience fields.”

“I am deeply honored to receive this award from the NAS,” says Kanwisher, who is also an investigator in MIT’s McGovern Institute and a member of the Center for Brains, Minds and Machines. “It has been a profound privilege, and a total blast, to watch the human brain in action as these data began to reveal an initial picture of the organization of the human mind. But the biggest joy has been the opportunity to work with the incredible group of talented young scientists who actually did the work that this award recognizes.”

A window into the mind

Kanwisher is best known for her landmark insights into how humans recognize and process faces. Psychology had long-suggested that recognizing a face might be distinct from general object recognition. But Kanwisher galvanized the field in 1997 with her seminal discovery that the human brain contains a small region specialized to respond only to faces. The region, which Kanwisher termed the fusiform face area (FFA), became activated when subjects viewed images of faces in an MRI scanner, but not when they looked at scrambled faces or control stimuli.

Since her 1997 discovery (now the most highly cited manuscript in its area), Kanwisher and her students have applied similar methods to find brain specializations for the recognition of scenes, the mental states of others, language, and music. Taken together, her research provides a compelling glimpse into the architecture of the brain, and, ultimately, what makes us human.

“Nancy’s work over the past two decades has argued that many aspects of human cognition are supported by specialized neural circuitry, a conclusion that stands in contrast to our subjective sense of a singular mental experience,” says McGovern Institute Director Robert Desimone. “She has made profound contributions to the psychological and cognitive sciences and I am delighted that the National Academy of Sciences has recognized her outstanding achievements.”

One-in-a-million mentor

Beyond the lab, Kanwisher has a reputation as a tireless communicator and mentor who is actively engaged in the policy implications of brain research. The statistics speak for themselves: her 2014 TED talk, “A Neural portrait of the human mind” has been viewed over a million times online and her introductory MIT OCW course on the human brain has generated more than nine million views on YouTube.

Nancy Kanwisher works with researchers from her lab in MIT’s Martinos Imaging Center. Photo: Kris Brewer

Kanwisher also has an exceptional track record in training women scientists who have gone on to successful independent research careers, in many cases becoming prominent figures in their own right.

“Nancy is the one-in-a-million mentor, who is always skeptical of your ideas and your arguments, but immensely confident of your worth,” says Rebecca Saxe, John W. Jarve (1978) Professor of Brain and Cognitive Sciences, investigator at the McGovern Institute, and associate dean of MIT’s School of Science. Saxe was a graduate student in Kanwisher’s lab where she earned her PhD in cognitive neuroscience in 2003. “She has such authentic curiosity,” Saxe adds. “It’s infectious and sustaining. Working with Nancy was a constant reminder of why I wanted to be a scientist.”

The NAS will present Kanwisher with the award during its annual meeting on May 1, 2022 in Washington, DC. The event will be webcast live. Kanwisher plans to direct her prize funds to the non-profit organization Malengo, established by a former student and which provides quality undergraduate education to individuals who would otherwise not be able to afford it.

A key brain region responds to faces similarly in infants and adults

Within the visual cortex of the adult brain, a small region is specialized to respond to faces, while nearby regions show strong preferences for bodies or for scenes such as landscapes.

Neuroscientists have long hypothesized that it takes many years of visual experience for these areas to develop in children. However, a new MIT study suggests that these regions form much earlier than previously thought. In a study of babies ranging in age from two to nine months, the researchers identified areas of the infant visual cortex that already show strong preferences for either faces, bodies, or scenes, just as they do in adults.

“These data push our picture of development, making babies’ brains look more similar to adults, in more ways, and earlier than we thought,” says Rebecca Saxe, the John W. Jarve Professor of Brain and Cognitive Sciences, a member of MIT’s McGovern Institute for Brain Research, and the senior author of the new study.

Using functional magnetic resonance imaging (fMRI), the researchers collected usable data from more than 50 infants, a far greater number than any research lab has been able to scan before. This allowed them to examine the infant visual cortex in a way that had not been possible until now.

“This is a result that’s going to make a lot of people have to really grapple with their understanding of the infant brain, the starting point of development, and development itself,” says Heather Kosakowski, an MIT graduate student and the lead author of the study, which appears today in Current Biology.

MIT graduate student Heather Kosakowski prepares an infant for an MRI scan at the Martinos Imaging Center. Photo: Caitlin Cunningham

Distinctive regions

More than 20 years ago, Nancy Kanwisher, the Walter A. Rosenblith Professor of Cognitive Neuroscience at MIT, used fMRI to discover the fusiform face area: a small region of the visual cortex that responds much more strongly to faces than any other kind of visual input.

Since then, Kanwisher and her colleagues have also identified parts of the visual cortex that respond to bodies (the extrastriate body area, or EBA), and scenes (the parahippocampal place area, or PPA).

“There is this set of functionally very distinctive regions that are present in more or less the same place in pretty much every adult,” says Kanwisher, who is also a member of MIT’s Center for Brains, Minds, and Machines, and an author of the new study. “That raises all these questions about how these regions develop. How do they get there, and how do you build a brain that has such similar structure in each person?”

One way to try to answer those questions is to investigate when these highly selective regions first develop in the brain. A longstanding hypothesis is that it takes several years of visual experience for these regions to gradually become selective for their specific targets. Scientists who study the visual cortex have found similar selectivity patterns in children as young as 4 or 5 years old, but there have been few studies of children younger than that.

In 2017, Saxe and one of her graduate students, Ben Deen, reported the first successful use of fMRI to study the brains of awake infants. That study, which included data from nine babies, suggested that while infants did have areas that respond to faces and scenes, those regions were not yet highly selective. For example, the fusiform face area did not show a strong preference for human faces over every other kind of input, including human bodies or the faces of other animals.

However, that study was limited by the small number of subjects, and also by its reliance on an fMRI coil that the researchers had developed especially for babies, which did not offer as high-resolution imaging as the coils used for adults.

For the new study, the researchers wanted to try to get better data, from more babies. They built a new scanner that is more comfortable for babies and also more powerful, with resolution similar to that of fMRI scanners used to study the adult brain.

After going into the specialized scanner, along with a parent, the babies watched videos that showed either faces, body parts such as kicking feet or waving hands, objects such as toys, or natural scenes such as mountains.

The researchers recruited nearly 90 babies for the study, collected usable fMRI data from 52, half of which contributed higher-resolution data collected using the new coil. Their analysis revealed that specific regions of the infant visual cortex show highly selective responses to faces, body parts, and natural scenes, in the same locations where those responses are seen in the adult brain. The selectivity for natural scenes, however, was not as strong as for faces or body parts.

The infant brain

The findings suggest that scientists’ conception of how the infant brain develops may need to be revised to accommodate the observation that these specialized regions start to resemble those of adults sooner than anyone had expected.

“The thing that is so exciting about these data is that they revolutionize the way we understand the infant brain,” Kosakowski says. “A lot of theories have grown up in the field of visual neuroscience to accommodate the view that you need years of development for these specialized regions to emerge. And what we’re saying is actually, no, you only really need a couple of months.”

Because their data on the area of the brain that responds to scenes was not as strong as for the other locations they looked at, the researchers now plan to pursue additional studies of that region, this time showing babies images on a much larger screen that will more closely mimic the experience of being within a scene. For that study, they plan to use near-infrared spectroscopy (NIRS), a non-invasive imaging technique that doesn’t require the participant to be inside a scanner.

“That will let us ask whether young babies have robust responses to visual scenes that we underestimated in this study because of the visual constraints of the experimental setup in the scanner,” Saxe says.

The researchers are now further analyzing the data they gathered for this study in hopes of learning more about how development of the fusiform face area progresses from the youngest babies they studied to the oldest. They also hope to perform new experiments examining other aspects of cognition, including how babies’ brains respond to language and music.

The research was funded by the National Science Foundation, the National Institutes of Health, the McGovern Institute, and the Center for Brains, Minds, and Machines.

Artificial intelligence sheds light on how the brain processes language

In the past few years, artificial intelligence models of language have become very good at certain tasks. Most notably, they excel at predicting the next word in a string of text; this technology helps search engines and texting apps predict the next word you are going to type.

The most recent generation of predictive language models also appears to learn something about the underlying meaning of language. These models can not only predict the word that comes next, but also perform tasks that seem to require some degree of genuine understanding, such as question answering, document summarization, and story completion.

Such models were designed to optimize performance for the specific function of predicting text, without attempting to mimic anything about how the human brain performs this task or understands language. But a new study from MIT neuroscientists suggests the underlying function of these models resembles the function of language-processing centers in the human brain.

Computer models that perform well on other types of language tasks do not show this similarity to the human brain, offering evidence that the human brain may use next-word prediction to drive language processing.

“The better the model is at predicting the next word, the more closely it fits the human brain,” says Nancy Kanwisher, the Walter A. Rosenblith Professor of Cognitive Neuroscience, a member of MIT’s McGovern Institute for Brain Research and Center for Brains, Minds, and Machines (CBMM), and an author of the new study. “It’s amazing that the models fit so well, and it very indirectly suggests that maybe what the human language system is doing is predicting what’s going to happen next.”

Joshua Tenenbaum, a professor of computational cognitive science at MIT and a member of CBMM and MIT’s Artificial Intelligence Laboratory (CSAIL); and Evelina Fedorenko, the Frederick A. and Carole J. Middleton Career Development Associate Professor of Neuroscience and a member of the McGovern Institute, are the senior authors of the study, which appears this week in the Proceedings of the National Academy of Sciences.

Martin Schrimpf, an MIT graduate student who works in CBMM, is the first author of the paper.

Making predictions

The new, high-performing next-word prediction models belong to a class of models called deep neural networks. These networks contain computational “nodes” that form connections of varying strength, and layers that pass information between each other in prescribed ways.

Over the past decade, scientists have used deep neural networks to create models of vision that can recognize objects as well as the primate brain does. Research at MIT has also shown that the underlying function of visual object recognition models matches the organization of the primate visual cortex, even though those computer models were not specifically designed to mimic the brain.

In the new study, the MIT team used a similar approach to compare language-processing centers in the human brain with language-processing models. The researchers analyzed 43 different language models, including several that are optimized for next-word prediction. These include a model called GPT-3 (Generative Pre-trained Transformer 3), which, given a prompt, can generate text similar to what a human would produce. Other models were designed to perform different language tasks, such as filling in a blank in a sentence.

As each model was presented with a string of words, the researchers measured the activity of the nodes that make up the network. They then compared these patterns to activity in the human brain, measured in subjects performing three language tasks: listening to stories, reading sentences one at a time, and reading sentences in which one word is revealed at a time. These human datasets included functional magnetic resonance (fMRI) data and intracranial electrocorticographic measurements taken in people undergoing brain surgery for epilepsy.

They found that the best-performing next-word prediction models had activity patterns that very closely resembled those seen in the human brain. Activity in those same models was also highly correlated with measures of human behavioral measures such as how fast people were able to read the text.

“We found that the models that predict the neural responses well also tend to best predict human behavior responses, in the form of reading times. And then both of these are explained by the model performance on next-word prediction. This triangle really connects everything together,” Schrimpf says.

“A key takeaway from this work is that language processing is a highly constrained problem: The best solutions to it that AI engineers have created end up being similar, as this paper shows, to the solutions found by the evolutionary process that created the human brain. Since the AI network didn’t seek to mimic the brain directly — but does end up looking brain-like — this suggests that, in a sense, a kind of convergent evolution has occurred between AI and nature,” says Daniel Yamins, an assistant professor of psychology and computer science at Stanford University, who was not involved in the study.

Game changer

One of the key computational features of predictive models such as GPT-3 is an element known as a forward one-way predictive transformer. This kind of transformer is able to make predictions of what is going to come next, based on previous sequences. A significant feature of this transformer is that it can make predictions based on a very long prior context (hundreds of words), not just the last few words.

Scientists have not found any brain circuits or learning mechanisms that correspond to this type of processing, Tenenbaum says. However, the new findings are consistent with hypotheses that have been previously proposed that prediction is one of the key functions in language processing, he says.

“One of the challenges of language processing is the real-time aspect of it,” he says. “Language comes in, and you have to keep up with it and be able to make sense of it in real time.”

The researchers now plan to build variants of these language processing models to see how small changes in their architecture affect their performance and their ability to fit human neural data.

“For me, this result has been a game changer,” Fedorenko says. “It’s totally transforming my research program, because I would not have predicted that in my lifetime we would get to these computationally explicit models that capture enough about the brain so that we can actually leverage them in understanding how the brain works.”

The researchers also plan to try to combine these high-performing language models with some computer models Tenenbaum’s lab has previously developed that can perform other kinds of tasks such as constructing perceptual representations of the physical world.

“If we’re able to understand what these language models do and how they can connect to models which do things that are more like perceiving and thinking, then that can give us more integrative models of how things work in the brain,” Tenenbaum says. “This could take us toward better artificial intelligence models, as well as giving us better models of how more of the brain works and how general intelligence emerges, than we’ve had in the past.”

The research was funded by a Takeda Fellowship; the MIT Shoemaker Fellowship; the Semiconductor Research Corporation; the MIT Media Lab Consortia; the MIT Singleton Fellowship; the MIT Presidential Graduate Fellowship; the Friends of the McGovern Institute Fellowship; the MIT Center for Brains, Minds, and Machines, through the National Science Foundation; the National Institutes of Health; MIT’s Department of Brain and Cognitive Sciences; and the McGovern Institute.

Other authors of the paper are Idan Blank PhD ’16 and graduate students Greta Tuckute, Carina Kauf, and Eghbal Hosseini.

Storytelling brings MIT neuroscience community together

When the coronavirus pandemic shut down offices, labs, and classrooms across the MIT campus last spring, many members of the MIT community found it challenging to remain connected to one another in meaningful ways. Motivated by a desire to bring the neuroscience community back together, the McGovern Institute hosted a virtual storytelling competition featuring a selection of postdocs, grad students, and staff from across the institute.

“This has been an unprecedented year for us all,” says McGovern Institute Director Robert Desimone. “It has been twenty years since Pat and Lore McGovern founded the McGovern Institute, and despite the challenges this anniversary year has brought to our community, I have been inspired by the strength and perseverance demonstrated by our faculty, postdocs, students and staff. The resilience of this neuroscience community – and MIT as a whole – is indeed something to celebrate.”

The McGovern Institute had initially planned to hold a large 20th anniversary celebration in the atrium of Building 46 in the fall of 2020, but the pandemic made a gathering of this size impossible. The institute instead held a series of virtual events, including the November 12 story slam on the theme of resilience.

Face-specific brain area responds to faces even in people born blind

More than 20 years ago, neuroscientist Nancy Kanwisher and others discovered that a small section of the brain located near the base of the skull responds much more strongly to faces than to other objects we see. This area, known as the fusiform face area, is believed to be specialized for identifying faces.

Now, in a surprising new finding, Kanwisher and her colleagues have shown that this same region also becomes active in people who have been blind since birth, when they touch a three-dimensional model of a face with their hands. The finding suggests that this area does not require visual experience to develop a preference for faces.

“That doesn’t mean that visual input doesn’t play a role in sighted subjects — it probably does,” she says. “What we showed here is that visual input is not necessary to develop this particular patch, in the same location, with the same selectivity for faces. That was pretty astonishing.”

Kanwisher, the Walter A. Rosenblith Professor of Cognitive Neuroscience and a member of MIT’s McGovern Institute for Brain Research, is the senior author of the study. N. Apurva Ratan Murty, an MIT postdoc, is the lead author of the study, which appears this week in the Proceedings of the National Academy of Sciences. Other authors of the paper include Santani Teng, a former MIT postdoc; Aude Oliva, a senior research scientist, co-director of the MIT Quest for Intelligence, and MIT director of the MIT-IBM Watson AI Lab; and David Beeler and Anna Mynick, both former lab technicians.

Selective for faces

Studying people who were born blind allowed the researchers to tackle longstanding questions regarding how specialization arises in the brain. In this case, they were specifically investigating face perception, but the same unanswered questions apply to many other aspects of human cognition, Kanwisher says.

“This is part of a broader question that scientists and philosophers have been asking themselves for hundreds of years, about where the structure of the mind and brain comes from,” she says. “To what extent are we products of experience, and to what extent do we have built-in structure? This is a version of that question asking about the particular role of visual experience in constructing the face area.”

The new work builds on a 2017 study from researchers in Belgium. In that study, congenitally blind subjects were scanned with functional magnetic resonance imaging (fMRI) as they listened to a variety of sounds, some related to faces (such as laughing or chewing), and others not. That study found higher responses in the vicinity of the FFA to face-related sounds than to sounds such as a ball bouncing or hands clapping.

In the new study, the MIT team wanted to use tactile experience to measure more directly how the brains of blind people respond to faces. They created a ring of 3D-printed objects that included faces, hands, chairs, and mazes, and rotated them so that the subject could handle each one while in the fMRI scanner.

They began with normally sighted subjects and found that when they handled the 3D objects, a small area that corresponded to the location of the FFA was preferentially active when the subjects touched the faces, compared to when they touched other objects. This activity, which was weaker than the signal produced when sighted subjects looked at faces, was not surprising to see, Kanwisher says.

“We know that people engage in visual imagery, and we know from prior studies that visual imagery can activate the FFA. So the fact that you see the response with touch in a sighted person is not shocking because they’re visually imagining what they’re feeling,” she says.

The researchers then performed the same experiments, using tactile input only, with 15 subjects who reported being blind since birth. To their surprise, they found that the brain showed face-specific activity in the same area as the sighted subjects, at levels similar to when sighted people handled the 3D-printed faces.

“When we saw it in the first few subjects, it was really shocking, because no one had seen individual face-specific activations in the fusiform gyrus in blind subjects previously,” Murty says.

Patterns of connection

The researchers also explored several hypotheses that have been put forward to explain why face-selectivity always seems to develop in the same region of the brain. One prominent hypothesis suggests that the FFA develops face-selectivity because it receives visual input from the fovea (the center of the retina), and we tend to focus on faces at the center of our visual field. However, since this region developed in blind people with no foveal input, the new findings do not support this idea.

Another hypothesis is that the FFA has a natural preference for curved shapes. To test that idea, the researchers performed another set of experiments in which they asked the blind subjects to handle a variety of 3D-printed shapes, including cubes, spheres, and eggs. They found that the FFA did not show any preference for the curved objects over the cube-shaped objects.

The researchers did find evidence for a third hypothesis, which is that face selectivity arises in the FFA because of its connections to other parts of the brain. They were able to measure the FFA’s “connectivity fingerprint” — a measure of the correlation between activity in the FFA and activity in other parts of the brain — in both blind and sighted subjects.

They then used the data from each group to train a computer model to predict the exact location of the brain’s selective response to faces based on the FFA connectivity fingerprint. They found that when the model was trained on data from sighted patients, it could accurately predict the results in blind subjects, and vice versa. They also found evidence that connections to the frontal and parietal lobes of the brain, which are involved in high-level processing of sensory information, may be the most important in determining the role of the FFA.

“It’s suggestive of this very interesting story that the brain wires itself up in development not just by taking perceptual information and doing statistics on the input and allocating patches of brain, according to some kind of broadly agnostic statistical procedure,” Kanwisher says. “Rather, there are endogenous constraints in the brain present at birth, in this case, in the form of connections to higher-level brain regions, and these connections are perhaps playing a causal role in its development.”

The research was funded by the National Institutes of Health Shared Instrumentation Grant to the Athinoula Martinos Center at MIT, a National Eye Institute Training Grant, the Smith-Kettlewell Eye Research Institute’s Rehabilitation Engineering Research Center, an Office of Naval Research Vannevar Bush Faculty Fellowship, an NIH Pioneer Award, and a National Science Foundation Science and Technology Center Grant.

Full paper at PNAS