Deep neural networks show promise as models of human hearing

Computational models that mimic the structure and function of the human auditory system could help researchers design better hearing aids, cochlear implants, and brain-machine interfaces. A new study from MIT has found that modern computational models derived from machine learning are moving closer to this goal.

In the largest study yet of deep neural networks that have been trained to perform auditory tasks, the MIT team showed that most of these models generate internal representations that share properties of representations seen in the human brain when people are listening to the same sounds.

The study also offers insight into how to best train this type of model: The researchers found that models trained on auditory input including background noise more closely mimic the activation patterns of the human auditory cortex.

“What sets this study apart is it is the most comprehensive comparison of these kinds of models to the auditory system so far. The study suggests that models that are derived from machine learning are a step in the right direction, and it gives us some clues as to what tends to make them better models of the brain,” says Josh McDermott, an associate professor of brain and cognitive sciences at MIT, a member of MIT’s McGovern Institute for Brain Research and Center for Brains, Minds, and Machines, and the senior author of the study.

MIT graduate student Greta Tuckute and Jenelle Feather PhD ’22 are the lead authors of the open-access paper, which appears today in PLOS Biology.

Models of hearing

Deep neural networks are computational models that consists of many layers of information-processing units that can be trained on huge volumes of data to perform specific tasks. This type of model has become widely used in many applications, and neuroscientists have begun to explore the possibility that these systems can also be used to describe how the human brain performs certain tasks.

“These models that are built with machine learning are able to mediate behaviors on a scale that really wasn’t possible with previous types of models, and that has led to interest in whether or not the representations in the models might capture things that are happening in the brain,” Tuckute says.

When a neural network is performing a task, its processing units generate activation patterns in response to each audio input it receives, such as a word or other type of sound. Those model representations of the input can be compared to the activation patterns seen in fMRI brain scans of people listening to the same input.

In 2018, McDermott and then-graduate student Alexander Kell reported that when they trained a neural network to perform auditory tasks (such as recognizing words from an audio signal), the internal representations generated by the model showed similarity to those seen in fMRI scans of people listening to the same sounds.

Since then, these types of models have become widely used, so McDermott’s research group set out to evaluate a larger set of models, to see if the ability to approximate the neural representations seen in the human brain is a general trait of these models.

For this study, the researchers analyzed nine publicly available deep neural network models that had been trained to perform auditory tasks, and they also created 14 models of their own, based on two different architectures. Most of these models were trained to perform a single task — recognizing words, identifying the speaker, recognizing environmental sounds, and identifying musical genre — while two of them were trained to perform multiple tasks.

When the researchers presented these models with natural sounds that had been used as stimuli in human fMRI experiments, they found that the internal model representations tended to exhibit similarity with those generated by the human brain. The models whose representations were most similar to those seen in the brain were models that had been trained on more than one task and had been trained on auditory input that included background noise.

“If you train models in noise, they give better brain predictions than if you don’t, which is intuitively reasonable because a lot of real-world hearing involves hearing in noise, and that’s plausibly something the auditory system is adapted to,” Feather says.

Hierarchical processing

The new study also supports the idea that the human auditory cortex has some degree of hierarchical organization, in which processing is divided into stages that support distinct computational functions. As in the 2018 study, the researchers found that representations generated in earlier stages of the model most closely resemble those seen in the primary auditory cortex, while representations generated in later model stages more closely resemble those generated in brain regions beyond the primary cortex.

Additionally, the researchers found that models that had been trained on different tasks were better at replicating different aspects of audition. For example, models trained on a speech-related task more closely resembled speech-selective areas.

“Even though the model has seen the exact same training data and the architecture is the same, when you optimize for one particular task, you can see that it selectively explains specific tuning properties in the brain,” Tuckute says.

McDermott’s lab now plans to make use of their findings to try to develop models that are even more successful at reproducing human brain responses. In addition to helping scientists learn more about how the brain may be organized, such models could also be used to help develop better hearing aids, cochlear implants, and brain-machine interfaces.

“A goal of our field is to end up with a computer model that can predict brain responses and behavior. We think that if we are successful in reaching that goal, it will open a lot of doors,” McDermott says.

The research was funded by the National Institutes of Health, an Amazon Fellowship from the Science Hub, an International Doctoral Fellowship from the American Association of University Women, an MIT Friends of McGovern Institute Fellowship, a fellowship from the K. Lisa Yang Integrative Computational Neuroscience (ICoN) Center at MIT, and a Department of Energy Computational Science Graduate Fellowship.

What does the future hold for generative AI?

Speaking at the “Generative AI: Shaping the Future” symposium on Nov. 28, the kickoff event of MIT’s Generative AI Week, keynote speaker and iRobot co-founder Rodney Brooks warned attendees against uncritically overestimating the capabilities of this emerging technology, which underpins increasingly powerful tools like OpenAI’s ChatGPT and Google’s Bard.

“Hype leads to hubris, and hubris leads to conceit, and conceit leads to failure,” cautioned Brooks, who is also a professor emeritus at MIT, a former director of the Computer Science and Artificial Intelligence Laboratory (CSAIL), and founder of Robust.AI.

“No one technology has ever surpassed everything else,” he added.

The symposium, which drew hundreds of attendees from academia and industry to the Institute’s Kresge Auditorium, was laced with messages of hope about the opportunities generative AI offers for making the world a better place, including through art and creativity, interspersed with cautionary tales about what could go wrong if these AI tools are not developed responsibly.

Generative AI is a term to describe machine-learning models that learn to generate new material that looks like the data they were trained on. These models have exhibited some incredible capabilities, such as the ability to produce human-like creative writing, translate languages, generate functional computer code, or craft realistic images from text prompts.

In her opening remarks to launch the symposium, MIT President Sally Kornbluth highlighted several projects faculty and students have undertaken to use generative AI to make a positive impact in the world. For example, the work of the Axim Collaborative, an online education initiative launched by MIT and Harvard, includes exploring the educational aspects of generative AI to help underserved students.

The Institute also recently announced seed grants for 27 interdisciplinary faculty research projects centered on how AI will transform people’s lives across society.

In hosting Generative AI Week, MIT hopes to not only showcase this type of innovation, but also generate “collaborative collisions” among attendees, Kornbluth said.

Collaboration involving academics, policymakers, and industry will be critical if we are to safely integrate a rapidly evolving technology like generative AI in ways that are humane and help humans solve problems, she told the audience.

“I honestly cannot think of a challenge more closely aligned with MIT’s mission. It is a profound responsibility, but I have every confidence that we can face it, if we face it head on and if we face it as a community,” she said.

While generative AI holds the potential to help solve some of the planet’s most pressing problems, the emergence of these powerful machine learning models has blurred the distinction between science fiction and reality, said CSAIL Director Daniela Rus in her opening remarks. It is no longer a question of whether we can make machines that produce new content, she said, but how we can use these tools to enhance businesses and ensure sustainability. 

“Today, we will discuss the possibility of a future where generative AI does not just exist as a technological marvel, but stands as a source of hope and a force for good,” said Rus, who is also the Andrew and Erna Viterbi Professor in the Department of Electrical Engineering and Computer Science.

But before the discussion dove deeply into the capabilities of generative AI, attendees were first asked to ponder their humanity, as MIT Professor Joshua Bennett read an original poem.

Bennett, a professor in the MIT Literature Section and Distinguished Chair of the Humanities, was asked to write a poem about what it means to be human, and drew inspiration from his daughter, who was born three weeks ago.

The poem told of his experiences as a boy watching Star Trek with his father and touched on the importance of passing traditions down to the next generation.

In his keynote remarks, Brooks set out to unpack some of the deep, scientific questions surrounding generative AI, as well as explore what the technology can tell us about ourselves.

To begin, he sought to dispel some of the mystery swirling around generative AI tools like ChatGPT by explaining the basics of how this large language model works. ChatGPT, for instance, generates text one word at a time by determining what the next word should be in the context of what it has already written. While a human might write a story by thinking about entire phrases, ChatGPT only focuses on the next word, Brooks explained.

ChatGPT 3.5 is built on a machine-learning model that has 175 billion parameters and has been exposed to billions of pages of text on the web during training. (The newest iteration, ChatGPT 4, is even larger.) It learns correlations between words in this massive corpus of text and uses this knowledge to propose what word might come next when given a prompt.

The model has demonstrated some incredible capabilities, such as the ability to write a sonnet about robots in the style of Shakespeare’s famous Sonnet 18. During his talk, Brooks showcased the sonnet he asked ChatGPT to write side-by-side with his own sonnet.

But while researchers still don’t fully understand exactly how these models work, Brooks assured the audience that generative AI’s seemingly incredible capabilities are not magic, and it doesn’t mean these models can do anything.

His biggest fears about generative AI don’t revolve around models that could someday surpass human intelligence. Rather, he is most worried about researchers who may throw away decades of excellent work that was nearing a breakthrough, just to jump on shiny new advancements in generative AI; venture capital firms that blindly swarm toward technologies that can yield the highest margins; or the possibility that a whole generation of engineers will forget about other forms of software and AI.

At the end of the day, those who believe generative AI can solve the world’s problems and those who believe it will only generate new problems have at least one thing in common: Both groups tend to overestimate the technology, he said.

“What is the conceit with generative AI? The conceit is that it is somehow going to lead to artificial general intelligence. By itself, it is not,” Brooks said.

Following Brooks’ presentation, a group of MIT faculty spoke about their work using generative AI and participated in a panel discussion about future advances, important but underexplored research topics, and the challenges of AI regulation and policy.

The panel consisted of Jacob Andreas, an associate professor in the MIT Department of Electrical Engineering and Computer Science (EECS) and a member of CSAIL; Antonio Torralba, the Delta Electronics Professor of EECS and a member of CSAIL; Ev Fedorenko, an associate professor of brain and cognitive sciences and an investigator at the McGovern Institute for Brain Research at MIT; and Armando Solar-Lezama, a Distinguished Professor of Computing and associate director of CSAIL. It was moderated by William T. Freeman, the Thomas and Gerd Perkins Professor of EECS and a member of CSAIL.

The panelists discussed several potential future research directions around generative AI, including the possibility of integrating perceptual systems, drawing on human senses like touch and smell, rather than focusing primarily on language and images. The researchers also spoke about the importance of engaging with policymakers and the public to ensure generative AI tools are produced and deployed responsibly.

“One of the big risks with generative AI today is the risk of digital snake oil. There is a big risk of a lot of products going out that claim to do miraculous things but in the long run could be very harmful,” Solar-Lezama said.

The morning session concluded with an excerpt from the 1925 science fiction novel “Metropolis,” read by senior Joy Ma, a physics and theater arts major, followed by a roundtable discussion on the future of generative AI. The discussion included Joshua Tenenbaum, a professor in the Department of Brain and Cognitive Sciences and a member of CSAIL; Dina Katabi, the Thuan and Nicole Pham Professor in EECS and a principal investigator in CSAIL and the MIT Jameel Clinic; and Max Tegmark, professor of physics; and was moderated by Daniela Rus.

One focus of the discussion was the possibility of developing generative AI models that can go beyond what we can do as humans, such as tools that can sense someone’s emotions by using electromagnetic signals to understand how a person’s breathing and heart rate are changing.

But one key to integrating AI like this into the real world safely is to ensure that we can trust it, Tegmark said. If we know an AI tool will meet the specifications we insist on, then “we no longer have to be afraid of building really powerful systems that go out and do things for us in the world,” he said.

The brain may learn about the world the same way some computational models do

To make our way through the world, our brain must develop an intuitive understanding of the physical world around us, which we then use to interpret sensory information coming into the brain.

How does the brain develop that intuitive understanding? Many scientists believe that it may use a process similar to what’s known as “self-supervised learning.” This type of machine learning, originally developed as a way to create more efficient models for computer vision, allows computational models to learn about visual scenes based solely on the similarities and differences between them, with no labels or other information.

A pair of studies from researchers at the K. Lisa Yang Integrative Computational Neuroscience (ICoN) Center at MIT offers new evidence supporting this hypothesis. The researchers found that when they trained models known as neural networks using a particular type of self-supervised learning, the resulting models generated activity patterns very similar to those seen in the brains of animals that were performing the same tasks as the models.

The findings suggest that these models are able to learn representations of the physical world that they can use to make accurate predictions about what will happen in that world, and that the mammalian brain may be using the same strategy, the researchers say.

“The theme of our work is that AI designed to help build better robots ends up also being a framework to better understand the brain more generally,” says Aran Nayebi, a postdoc in the ICoN Center. “We can’t say if it’s the whole brain yet, but across scales and disparate brain areas, our results seem to be suggestive of an organizing principle.”

Nayebi is the lead author of one of the studies, co-authored with Rishi Rajalingham, a former MIT postdoc now at Meta Reality Labs, and senior authors Mehrdad Jazayeri, an associate professor of brain and cognitive sciences and a member of the McGovern Institute for Brain Research; and Robert Yang, an assistant professor of brain and cognitive sciences and an associate member of the McGovern Institute. Ila Fiete, director of the ICoN Center, a professor of brain and cognitive sciences, and an associate member of the McGovern Institute, is the senior author of the other study, which was co-led by Mikail Khona, an MIT graduate student, and Rylan Schaeffer, a former senior research associate at MIT.

Both studies will be presented at the 2023 Conference on Neural Information Processing Systems (NeurIPS) in December.

Modeling the physical world

Early models of computer vision mainly relied on supervised learning. Using this approach, models are trained to classify images that are each labeled with a name — cat, car, etc. The resulting models work well, but this type of training requires a great deal of human-labeled data.

To create a more efficient alternative, in recent years researchers have turned to models built through a technique known as contrastive self-supervised learning. This type of learning allows an algorithm to learn to classify objects based on how similar they are to each other, with no external labels provided.

“This is a very powerful method because you can now leverage very large modern data sets, especially videos, and really unlock their potential,” Nayebi says. “A lot of the modern AI that you see now, especially in the last couple years with ChatGPT and GPT-4, is a result of training a self-supervised objective function on a large-scale dataset to obtain a very flexible representation.”

These types of models, also called neural networks, consist of thousands or millions of processing units connected to each other. Each node has connections of varying strengths to other nodes in the network. As the network analyzes huge amounts of data, the strengths of those connections change as the network learns to perform the desired task.

As the model performs a particular task, the activity patterns of different units within the network can be measured. Each unit’s activity can be represented as a firing pattern, similar to the firing patterns of neurons in the brain. Previous work from Nayebi and others has shown that self-supervised models of vision generate activity similar to that seen in the visual processing system of mammalian brains.

In both of the new NeurIPS studies, the researchers set out to explore whether self-supervised computational models of other cognitive functions might also show similarities to the mammalian brain. In the study led by Nayebi, the researchers trained self-supervised models to predict the future state of their environment across hundreds of thousands of naturalistic videos depicting everyday scenarios.

“For the last decade or so, the dominant method to build neural network models in cognitive neuroscience is to train these networks on individual cognitive tasks. But models trained this way rarely generalize to other tasks,” Yang says. “Here we test whether we can build models for some aspect of cognition by first training on naturalistic data using self-supervised learning, then evaluating in lab settings.”

Once the model was trained, the researchers had it generalize to a task they call “Mental-Pong.” This is similar to the video game Pong, where a player moves a paddle to hit a ball traveling across the screen. In the Mental-Pong version, the ball disappears shortly before hitting the paddle, so the player has to estimate its trajectory in order to hit the ball.

The researchers found that the model was able to track the hidden ball’s trajectory with accuracy similar to that of neurons in the mammalian brain, which had been shown in a previous study by Rajalingham and Jazayeri to simulate its trajectory — a cognitive phenomenon known as “mental simulation.” Furthermore, the neural activation patterns seen within the model were similar to those seen in the brains of animals as they played the game — specifically, in a part of the brain called the dorsomedial frontal cortex. No other class of computational model has been able to match the biological data as closely as this one, the researchers say.

“There are many efforts in the machine learning community to create artificial intelligence,” Jazayeri says. “The relevance of these models to neurobiology hinges on their ability to additionally capture the inner workings of the brain. The fact that Aran’s model predicts neural data is really important as it suggests that we may be getting closer to building artificial systems that emulate natural intelligence.”

Navigating the world

The study led by Khona, Schaeffer, and Fiete focused on a type of specialized neurons known as grid cells. These cells, located in the entorhinal cortex, help animals to navigate, working together with place cells located in the hippocampus.

While place cells fire whenever an animal is in a specific location, grid cells fire only when the animal is at one of the vertices of a triangular lattice. Groups of grid cells create overlapping lattices of different sizes, which allows them to encode a large number of positions using a relatively small number of cells.

In recent studies, researchers have trained supervised neural networks to mimic grid cell function by predicting an animal’s next location based on its starting point and velocity, a task known as path integration. However, these models hinged on access to privileged information about absolute space at all times — information that the animal does not have.

Inspired by the striking coding properties of the multiperiodic grid-cell code for space, the MIT team trained a contrastive self-supervised model to both perform this same path integration task and represent space efficiently while doing so. For the training data, they used sequences of velocity inputs. The model learned to distinguish positions based on whether they were similar or different — nearby positions generated similar codes, but further positions generated more different codes.

“It’s similar to training models on images, where if two images are both heads of cats, their codes should be similar, but if one is the head of a cat and one is a truck, then you want their codes to repel,” Khona says. “We’re taking that same idea but applying it to spatial trajectories.”

Once the model was trained, the researchers found that the activation patterns of the nodes within the model formed several lattice patterns with different periods, very similar to those formed by grid cells in the brain.

“What excites me about this work is that it makes connections between mathematical work on the striking information-theoretic properties of the grid cell code and the computation of path integration,” Fiete says. “While the mathematical work was analytic — what properties does the grid cell code possess? — the approach of optimizing coding efficiency through self-supervised learning and obtaining grid-like tuning is synthetic: It shows what properties might be necessary and sufficient to explain why the brain has grid cells.”

The research was funded by the K. Lisa Yang ICoN Center, the National Institutes of Health, the Simons Foundation, the McKnight Foundation, the McGovern Institute, and the Helen Hay Whitney Foundation.

Study: Deep neural networks don’t see the world the way we do

Human sensory systems are very good at recognizing objects that we see or words that we hear, even if the object is upside down or the word is spoken by a voice we’ve never heard.

Computational models known as deep neural networks can be trained to do the same thing, correctly identifying an image of a dog regardless of what color its fur is, or a word regardless of the pitch of the speaker’s voice. However, a new study from MIT neuroscientists has found that these models often also respond the same way to images or words that have no resemblance to the target.

When these neural networks were used to generate an image or a word that they responded to in the same way as a specific natural input, such as a picture of a bear, most of them generated images or sounds that were unrecognizable to human observers. This suggests that these models build up their own idiosyncratic “invariances” — meaning that they respond the same way to stimuli with very different features.

The findings offer a new way for researchers to evaluate how well these models mimic the organization of human sensory perception, says Josh McDermott, an associate professor of brain and cognitive sciences at MIT and a member of MIT’s McGovern Institute for Brain Research and Center for Brains, Minds, and Machines.

“This paper shows that you can use these models to derive unnatural signals that end up being very diagnostic of the representations in the model,” says McDermott, who is the senior author of the study. “This test should become part of a battery of tests that we as a field are using to evaluate models.”

Jenelle Feather PhD ’22, who is now a research fellow at the Flatiron Institute Center for Computational Neuroscience, is the lead author of the open-access paper, which appears today in Nature Neuroscience. Guillaume Leclerc, an MIT graduate student, and Aleksander Mądry, the Cadence Design Systems Professor of Computing at MIT, are also authors of the paper.

Different perceptions

In recent years, researchers have trained deep neural networks that can analyze millions of inputs (sounds or images) and learn common features that allow them to classify a target word or object roughly as accurately as humans do. These models are currently regarded as the leading models of biological sensory systems.

It is believed that when the human sensory system performs this kind of classification, it learns to disregard features that aren’t relevant to an object’s core identity, such as how much light is shining on it or what angle it’s being viewed from. This is known as invariance, meaning that objects are perceived to be the same even if they show differences in those less important features.

“Classically, the way that we have thought about sensory systems is that they build up invariances to all those sources of variation that different examples of the same thing can have,” Feather says. “An organism has to recognize that they’re the same thing even though they show up as very different sensory signals.”

The researchers wondered if deep neural networks that are trained to perform classification tasks might develop similar invariances. To try to answer that question, they used these models to generate stimuli that produce the same kind of response within the model as an example stimulus given to the model by the researchers.

They term these stimuli “model metamers,” reviving an idea from classical perception research whereby stimuli that are indistinguishable to a system can be used to diagnose its invariances. The concept of metamers was originally developed in the study of human perception to describe colors that look identical even though they are made up of different wavelengths of light.

To their surprise, the researchers found that most of the images and sounds produced in this way looked and sounded nothing like the examples that the models were originally given. Most of the images were a jumble of random-looking pixels, and the sounds resembled unintelligible noise. When researchers showed the images to human observers, in most cases the humans did not classify the images synthesized by the models in the same category as the original target example.

“They’re really not recognizable at all by humans. They don’t look or sound natural and they don’t have interpretable features that a person could use to classify an object or word,” Feather says.

The findings suggest that the models have somehow developed their own invariances that are different from those found in human perceptual systems. This causes the models to perceive pairs of stimuli as being the same despite their being wildly different to a human.

Idiosyncratic invariances

The researchers found the same effect across many different vision and auditory models. However, each of these models appeared to develop their own unique invariances. When metamers from one model were shown to another model, the metamers were just as unrecognizable to the second model as they were to human observers.

“The key inference from that is that these models seem to have what we call idiosyncratic invariances,” McDermott says. “They have learned to be invariant to these particular dimensions in the stimulus space, and it’s model-specific, so other models don’t have those same invariances.”

The researchers also found that they could induce a model’s metamers to be more recognizable to humans by using an approach called adversarial training. This approach was originally developed to combat another limitation of object recognition models, which is that introducing tiny, almost imperceptible changes to an image can cause the model to misrecognize it.

The researchers found that adversarial training, which involves including some of these slightly altered images in the training data, yielded models whose metamers were more recognizable to humans, though they were still not as recognizable as the original stimuli. This improvement appears to be independent of the training’s effect on the models’ ability to resist adversarial attacks, the researchers say.

“This particular form of training has a big effect, but we don’t really know why it has that effect,” Feather says. “That’s an area for future research.”

Analyzing the metamers produced by computational models could be a useful tool to help evaluate how closely a computational model mimics the underlying organization of human sensory perception systems, the researchers say.

“This is a behavioral test that you can run on a given model to see whether the invariances are shared between the model and human observers,” Feather says. “It could also be used to evaluate how idiosyncratic the invariances are within a given model, which could help uncover potential ways to improve our models in the future.”

The research was funded by the National Science Foundation, the National Institutes of Health, a Department of Energy Computational Science Graduate Fellowship, and a Friends of the McGovern Institute Fellowship.

Study decodes surprising approach mice take in learning

Neuroscience discoveries ranging from the nature of memory to treatments for disease have depended on reading the minds of mice, so researchers need to truly understand what the rodents’ behavior is telling them during experiments. In a new study that examines learning from reward, MIT researchers deciphered some initially mystifying mouse behavior, yielding new ideas about how mice think and a mathematical tool to aid future research.

The task the mice were supposed to master is simple: Turn a wheel left or right to get a reward and then recognize when the reward direction switches. When neurotypical people play such “reversal learning” games they quickly infer the optimal approach: stick with the direction that works until it doesn’t and then switch right away. Notably, people with schizophrenia struggle with the task. In the new study in PLOS Computational Biology, mice surprised scientists by showing that while they were capable of learning the “win-stay, lose-shift” strategy, they nonetheless refused to fully adopt it.

“It is not that mice cannot form an inference-based model of this environment—they can,” said corresponding author Mriganka Sur, Newton Professor in The Picower Institute for Learning and Memory and MIT’s Department of Brain and Cognitive Sciences (BCS). “The surprising thing is that they don’t persist with it. Even in a single block of the game where you know the reward is 100 percent on one side, every so often they will try the other side.”

While the mouse motif of departing from the optimal strategy could be due to a failure to hold it in memory, said lead author and Sur Lab graduate student Nhat Le, another possibility is that mice don’t commit to the “win-stay, lose-shift” approach because they don’t trust that their circumstances will remain stable or predictable. Instead, they might deviate from the optimal regime to test whether the rules have changed. Natural settings, after all, are rarely stable or predictable.

“I’d like to think mice are smarter than we give them credit for,” Le said.

But regardless of which reason may cause the mice to mix strategies, added co-senior author Mehrdad Jazayeri, Associate Professor in BCS and the McGovern Institute for Brain Research, it is important for researchers to recognize that they do and to be able to tell when and how they are choosing one strategy or another.

“This study highlights the fact that, unlike the accepted wisdom, mice doing lab tasks do not necessarily adopt a stationary strategy and it offers a computationally rigorous approach to detect and quantify such non-stationarities,” he said. “This ability is important because when researchers record the neural activity, their interpretation of the underlying algorithms and mechanisms may be invalid when they do not take the animals’ shifting strategies into account.”

Tracking thinking

The research team, which also includes co-author Murat Yildirim, a former Sur lab postdoc who is now an assistant professor at the Cleveland Clinic Lerner Research Institute, initially expected that the mice might adopt one strategy or the other. They simulated the results they’d expect to see if the mice either adopted the optimal strategy of inferring a rule about the task, or more randomly surveying whether left or right turns were being rewarded. Mouse behavior on the task, even after days, varied widely but it never resembled the results simulated by just one strategy.

To differing, individual extents, mouse performance on the task reflected variance along three parameters: how quickly they switched directions after the rule switched, how long it took them to transition to the new direction, and how loyal they remained to the new direction. Across 21 mice, the raw data represented a surprising diversity of outcomes on a task that neurotypical humans uniformly optimize. But the mice clearly weren’t helpless. Their average performance significantly improved over time, even though it plateaued below the optimal level.

In the task, the rewarded side switched every 15-25 turns. The team realized the mice were using more than one strategy in each such “block” of the game, rather than just inferring the simple rule and optimizing based on that inference. To disentangle when the mice were employing that strategy or another, the team harnessed an analytical framework called a Hidden Markov Model (HMM), which can computationally tease out when one unseen state is producing a result vs. another unseen state. Le likens it to what a judge on a cooking show might do: inferring which chef contestant made which version of a dish based on patterns in each plate of food before them.

Before the team could use an HMM to decipher their mouse performance results, however, they had to adapt it. A typical HMM might apply to individual mouse choices, but here the team modified it to explain choice transitions over the course of whole blocks. They dubbed their modified model the blockHMM. Computational simulations of task performance using the blockHMM showed that the algorithm is able to infer the true hidden states of an artificial agent. The authors then used this technique to show the mice were persistently blending multiple strategies, achieving varied levels of performance.

“We verified that each animal executes a mixture of behavior from multiple regimes instead of a behavior in a single domain,” Le and his co-authors wrote. “Indeed 17/21 mice used a combination of low, medium and high-performance behavior modes.”

Further analysis revealed that the strategies afoot were indeed the “correct” rule inference strategy and a more exploratory strategy consistent with randomly testing options to get turn-by-turn feedback.

Now that the researchers have decoded the peculiar approach mice take to reversal learning, they are planning to look more deeply into the brain to understand which brain regions and circuits are involved. By watching brain cell activity during the task, they hope to discern what underlies the decisions the mice make to switch strategies.

By examining reversal learning circuits in detail, Sur said, it’s possible the team will gain insights that could help explain why people with schizophrenia show diminished performance on reversal learning tasks. Sur added that some people with autism spectrum disorders also persist with newly unrewarded behaviors longer than neurotypical people, so his lab will also have that phenomenon in mind as they investigate.

Yildirim, too, is interested in examining potential clinical connections.

“This reversal learning paradigm fascinates me since I want to use it in my lab with various preclinical models of neurological disorders,” he said. “The next step for us is to determine the brain mechanisms underlying these differences in behavioral strategies and whether we can manipulate these strategies.”

Funding for the study came from The National Institutes of Health, the Army Research Office, a Paul and Lilah Newton Brain Science Research Award, the Massachusetts Life Sciences Initiative, The Picower Institute for Learning and Memory and The JPB Foundation.

When computer vision works more like a brain, it sees more like people do

From cameras to self-driving cars, many of today’s technologies depend on artificial intelligence (AI) to extract meaning from visual information.  Today’s AI technology has artificial neural networks at its core, and most of the time we can trust these AI computer vision systems to see things the way we do — but sometimes they falter. According to MIT and IBM Research scientists, one way to improve computer vision is to instruct the artificial neural networks that they rely on to deliberately mimic the way the brain’s biological neural network processes visual images.

Researchers led by James DiCarlo, the director of MIT’s Quest for Intelligence and member of the MIT-IBM Watson AI Lab, have made a computer vision model more robust by training it to work like a part of the brain that humans and other primates rely on for object recognition. This May, at the International Conference on Learning Representations (ICLR), the team reported that when they trained an artificial neural network using neural activity patterns in the brain’s inferior temporal (IT) cortex, the artificial neural network was more robustly able to identify objects in images than a model that lacked that neural training. And the model’s interpretations of images more closely matched what humans saw, even when images included minor distortions that made the task more difficult.

Comparing neural circuits

Portrait of Professor DiCarlo
McGovern Investigator and Director of MIT Quest for Intelligence, James DiCarlo. Photo: Justin Knight

Many of the artificial neural networks used for computer vision already resemble the multi-layered brain circuits that process visual information in humans and other primates. Like the brain, they use neuron-like units that work together to process information. As they are trained for a particular task, these layered components collectively and progressively process the visual information to complete the task — determining for example, that an image depicts a bear or a car or a tree.

DiCarlo and others previously found that when such deep-learning computer vision systems establish efficient ways to solve visual problems, they end up with artificial circuits that work similarly to the neural circuits that process visual information in our own brains. That is, they turn out to be surprisingly good scientific models of the neural mechanisms underlying primate and human vision.

That resemblance is helping neuroscientists deepen their understanding of the brain. By demonstrating ways visual information can be processed to make sense of images, computational models suggest hypotheses about how the brain might accomplish the same task. As developers continue to refine computer vision models, neuroscientists have found new ideas to explore in their own work.

“As vision systems get better at performing in the real world, some of them turn out to be more human-like in their internal processing. That’s useful from an understanding biology point of view,” says DiCarlo, who is also a professor of brain and cognitive sciences and an investigator at the McGovern Institute.

Engineering more brain-like AI

While their potential is promising, computer vision systems are not yet perfect models of human vision. DiCarlo suspected one way to improve computer vision may be to incorporate specific brain-like features into these models.

To test this idea, he and his collaborators built a computer vision model using neural data previously collected from vision-processing neurons in the monkey IT cortex — a key part of the primate ventral visual pathway involved in the recognition of objects — while the animals viewed various images. More specifically, Joel Dapello, a Harvard graduate student and former MIT-IBM Watson AI Lab intern, and Kohitij Kar, Assistant Professor, Canada Research Chair (Visual Neuroscience) at York University and visiting scientist at MIT, in collaboration with David Cox, IBM Research’s VP for AI Models and IBM director of the MIT-IBM Watson AI Lab, and other researchers at IBM Research and MIT, asked an artificial neural network to emulate the behavior of these primate vision-processing neurons while the network learned to identify objects in a standard computer vision task.

“In effect, we said to the network, ‘please solve this standard computer vision task, but please also make the function of one of your inside simulated “neural” layers be as similar as possible to the function of the corresponding biological neural layer,’” DiCarlo explains. “We asked it to do both of those things as best it could.” This forced the artificial neural circuits to find a different way to process visual information than the standard, computer vision approach, he says.

After training the artificial model with biological data, DiCarlo’s team compared its activity to a similarly-sized neural network model trained without neural data, using the standard approach for computer vision. They found that the new, biologically-informed model IT layer was – as instructed — a better match for IT neural data.  That is, for every image tested, the population of artificial IT neurons in the model responded more similarly to the corresponding population of biological IT neurons.

“Everybody gets something out of the exciting virtuous cycle between natural/biological intelligence and artificial intelligence,” DiCarlo says.

The researchers also found that the model IT was also a better match to IT neural data collected from another monkey, even though the model had never seen data from that animal, and even when that comparison was evaluated on that monkey’s IT responses to new images. This indicated that the team’s new, “neurally-aligned” computer model may be an improved model of the neurobiological function of the primate IT cortex — an interesting finding, given that it was previously unknown whether the amount of neural data that can be currently collected from the primate visual system is capable of directly guiding model development.

With their new computer model in hand, the team asked whether the “IT neural alignment” procedure also leads to any changes in the overall behavioral performance of the model. Indeed, they found that the neurally-aligned model was more human-like in its behavior — it tended to succeed in correctly categorizing objects in images for which humans also succeed, and it tended to fail when humans also fail.

Adversarial attacks

The team also found that the neurally-aligned model was more resistant to “adversarial attacks” that developers use to test computer vision and AI systems.  In computer vision, adversarial attacks introduce small distortions into images that are meant to mislead an artificial neural network.

“Say that you have an image that the model identifies as a cat. Because you have the knowledge of the internal workings of the model, you can then design very small changes in the image so that the model suddenly thinks it’s no longer a cat,” DiCarlo explains.

These minor distortions don’t typically fool humans, but computer vision models struggle with these alterations. A person who looks at the subtly distorted cat still reliably and robustly reports that it’s a cat. But standard computer vision models are more likely to mistake the cat for a dog, or even a tree.

“There must be some internal differences in the way our brains process images that lead to our vision being more resistant to those kinds of attacks,” DiCarlo says. And indeed, the team found that when they made their model more neurally-aligned, it became more robust, correctly identifying more images in the face of adversarial attacks.  The model could still be fooled by stronger “attacks,” but so can people, DiCarlo says. His team is now exploring the limits of adversarial robustness in humans.

A few years ago, DiCarlo’s team found they could also improve a model’s resistance to adversarial attacks by designing the first layer of the artificial network to emulate the early visual processing layer in the brain. One key next step is to combine such approaches — making new models that are simultaneously neurally-aligned at multiple visual processing layers.

The new work is further evidence that an exchange of ideas between neuroscience and computer science can drive progress in both fields. “Everybody gets something out of the exciting virtuous cycle between natural/biological intelligence and artificial intelligence,” DiCarlo says. “In this case, computer vision and AI researchers get new ways to achieve robustness and neuroscientists and cognitive scientists get more accurate mechanistic models of human vision.”

This work was supported by the MIT-IBM Watson AI Lab, Semiconductor Research Corporation, DARPA, the Massachusetts Institute of Technology Shoemaker Fellowship, Office of Naval Research, the Simons Foundation, and Canada Research Chair Program.

Computational model mimics humans’ ability to predict emotions

When interacting with another person, you likely spend part of your time trying to anticipate how they will feel about what you’re saying or doing. This task requires a cognitive skill called theory of mind, which helps us to infer other people’s beliefs, desires, intentions, and emotions.

MIT neuroscientists have now designed a computational model that can predict other people’s emotions — including joy, gratitude, confusion, regret, and embarrassment — approximating human observers’ social intelligence. The model was designed to predict the emotions of people involved in a situation based on the prisoner’s dilemma, a classic game theory scenario in which two people must decide whether to cooperate with their partner or betray them.

To build the model, the researchers incorporated several factors that have been hypothesized to influence people’s emotional reactions, including that person’s desires, their expectations in a particular situation, and whether anyone was watching their actions.

“These are very common, basic intuitions, and what we said is, we can take that very basic grammar and make a model that will learn to predict emotions from those features,” says Rebecca Saxe, the John W. Jarve Professor of Brain and Cognitive Sciences, a member of MIT’s McGovern Institute for Brain Research, and the senior author of the study.

Sean Dae Houlihan PhD ’22, a postdoc at the Neukom Institute for Computational Science at Dartmouth College, is the lead author of the paper, which appears today in Philosophical Transactions A. Other authors include Max Kleiman-Weiner PhD ’18, a postdoc at MIT and Harvard University; Luke Hewitt PhD ’22, a visiting scholar at Stanford University; and Joshua Tenenbaum, a professor of computational cognitive science at MIT and a member of the Center for Brains, Minds, and Machines and MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL).

Predicting emotions

While a great deal of research has gone into training computer models to infer someone’s emotional state based on their facial expression, that is not the most important aspect of human emotional intelligence, Saxe says. Much more important is the ability to predict someone’s emotional response to events before they occur.

“The most important thing about what it is to understand other people’s emotions is to anticipate what other people will feel before the thing has happened,” she says. “If all of our emotional intelligence was reactive, that would be a catastrophe.”

To try to model how human observers make these predictions, the researchers used scenarios taken from a British game show called “Golden Balls.” On the show, contestants are paired up with a pot of $100,000 at stake. After negotiating with their partner, each contestant decides, secretly, whether to split the pool or try to steal it. If both decide to split, they each receive $50,000. If one splits and one steals, the stealer gets the entire pot. If both try to steal, no one gets anything.

Depending on the outcome, contestants may experience a range of emotions — joy and relief if both contestants split, surprise and fury if one’s opponent steals the pot, and perhaps guilt mingled with excitement if one successfully steals.

To create a computational model that can predict these emotions, the researchers designed three separate modules. The first module is trained to infer a person’s preferences and beliefs based on their action, through a process called inverse planning.

“This is an idea that says if you see just a little bit of somebody’s behavior, you can probabilistically infer things about what they wanted and expected in that situation,” Saxe says.

Using this approach, the first module can predict contestants’ motivations based on their actions in the game. For example, if someone decides to split in an attempt to share the pot, it can be inferred that they also expected the other person to split. If someone decides to steal, they may have expected the other person to steal, and didn’t want to be cheated. Or, they may have expected the other person to split and decided to try to take advantage of them.

The model can also integrate knowledge about specific players, such as the contestant’s occupation, to help it infer the players’ most likely motivation.

The second module compares the outcome of the game with what each player wanted and expected to happen. Then, a third module predicts what emotions the contestants may be feeling, based on the outcome and what was known about their expectations. This third module was trained to predict emotions based on predictions from human observers about how contestants would feel after a particular outcome. The authors emphasize that this is a model of human social intelligence, designed to mimic how observers causally reason about each other’s emotions, not a model of how people actually feel.

“From the data, the model learns that what it means, for example, to feel a lot of joy in this situation, is to get what you wanted, to do it by being fair, and to do it without taking advantage,” Saxe says.

Core intuitions

Once the three modules were up and running, the researchers used them on a new dataset from the game show to determine how the models’ emotion predictions compared with the predictions made by human observers. This model performed much better at that task than any previous model of emotion prediction.

The model’s success stems from its incorporation of key factors that the human brain also uses when predicting how someone else will react to a given situation, Saxe says. Those include computations of how a person will evaluate and emotionally react to a situation, based on their desires and expectations, which relate to not only material gain but also how they are viewed by others.

“Our model has those core intuitions, that the mental states underlying emotion are about what you wanted, what you expected, what happened, and who saw. And what people want is not just stuff. They don’t just want money; they want to be fair, but also not to be the sucker, not to be cheated,” she says.

“The researchers have helped build a deeper understanding of how emotions contribute to determining our actions; and then, by flipping their model around, they explain how we can use people’s actions to infer their underlying emotions. This line of work helps us see emotions not just as ‘feelings’ but as playing a crucial, and subtle, role in human social behavior,” says Nick Chater, a professor of behavioral science at the University of Warwick, who was not involved in the study.

In future work, the researchers hope to adapt the model so that it can perform more general predictions based on situations other than the game-show scenario used in this study. They are also working on creating models that can predict what happened in the game based solely on the expression on the faces of the contestants after the results were announced.

The research was funded by the McGovern Institute; the Paul E. and Lilah Newton Brain Science Award; the Center for Brains, Minds, and Machines; the MIT-IBM Watson AI Lab; and the Multidisciplinary University Research Initiative.

PhD student Wei-Chen Wang is moved to help people heal

This story originally appeared in the Spring 2023 issue of Spectrum.

___

When he turned his ankle five years ago as an undergraduate playing pickup basketball at the University of Illinois, Wei-Chen (Eric) Wang SM ’22 knew his life would change in certain ways. For one thing, Wang, then a computer science major, wouldn’t be playing basketball anytime soon. He also assumed, correctly, that he might require physical therapy (PT).

What he did not foresee was that this minor injury would influence his career trajectory. While lying on the PT bench, Wang began to wonder: “Can I replicate what the therapist is doing using a robot?” It was an idle thought at the time. Today, however, his research involves robots and movement, closely related to what had seemed a passing fancy.

Wang continued his focus on computer science as an MIT graduate student, receiving his master’s in 2022 before deciding to pursue work of a more applied nature. He met Nidhi Seethapathi, who had joined MIT’s faculty as an assistant professor in electrical engineering and computer science and brain and cognitive science a few months earlier, and was intrigued by the notion of creating robots that could illuminate the key principles of movement—knowledge that might someday help people regain the ability to move comfortably after suffering from injury, stroke, or disease.

As the first PhD student in Seethapathi’s group and a MathWorks Fellow, Wang is charged with building machine learning-based models that can accurately predict and reproduce human movements. He will then use computer-simulated environments to visualize and evaluate the performance of these models.

To begin, he needs to gather data about specific human movements. One potential data collection method involves the placement of sensors or markers on different parts of the body to pinpoint their precise positions at any given moment. He can then try to calculate those positions in the future, as dictated by the equations of motion in physics.

The other method relies on computer vision-powered software that can automatically convert video footage to motion data. Wang prefers the latter approach, which he considers more natural. “We just look at what humans are doing and try to learn from that directly,” he explains. That’s also where machine learning comes in. “We use machine-learning tools to extract data from the video, and those data become the input to our model,” he adds. The model, in this case, is just another term for the robot brain.

The near-term goal is not to make robots more natural, Wang notes. “We’re using [simulated] robots to understand how humans are moving and eventually to explain any kind of movement—or at least that’s the hope. That said, based on the general principles we’re able to abstract, we might someday build robots that can move more naturally.”

Wang is also collaborating on a project headed by postdoctoral fellow Antoine De Comité that focuses on robotic retrieval of objects—the movements required to remove books from a library shelf, for example, or to grab a drink from a refrigerator. While robots routinely excel at tasks such as grasping an object on a tabletop, performing naturalistic movements in three dimensions remains challenging.

Wang describes a video shown by a Stanford University scientist in which a robot destroyed a refrigerator while attempting to extract a beer. He and De Comité hope for better results with robots that have undergone reinforcement learning—an approach using deep learning in which desired motions are rewarded or reinforced whereas unwanted motions are discouraged.

If they succeed in designing a robot that can safely retrieve a beer, Wang says, then more important and delicate tasks could be within reach. Someday, a robot at PT might guide a patient through knee exercises or apply ultrasound to an arthritic elbow.

Modeling the marvelous journey from A to B

This story originally appeared in the Spring 2023 issue of Spectrum.

___

Nidhi Seethapathi was first drawn to using powerful yet simple models to understand elaborate patterns when she learned about Newton’s laws of motion as a high school student in India. She was fascinated by the idea that wonderfully complex behaviors can arise from a set of objects that follow a few elementary rules.

Now an assistant professor at MIT, Seethapathi seeks to capture the intricacies of movement in the real world, using computational modeling as well as input from theory and experimentation. “[Theoretical physicist and Nobel laureate] Richard Feynman ’39 once said, ‘What I cannot create, I do not understand,’” Seethapathi says. “In that same spirit, the way I try to understand movement is by building models that move the way we do.”

Models of locomotion in the real world

Seethapathi—who holds a shared faculty position between the Department of Brain and Cognitive Sciences and the Department of Electrical Engineering and Computer Science’s Faculty of Artificial Intelligence + Decision- Making, which is housed in the Schwarzman College of Computing and the School of Engineering—recalls a moment during her undergraduate years studying mechanical engineering in Mumbai when a professor asked students to pick an aspect of movement to examine in detail. While most of her peers chose to analyze machines, Seethapathi selected the human hand. She was astounded by its versatility, she says, and by the number of variables, referred to by scientists as “degrees of freedom,” that are needed to characterize routine manual tasks. The assignment made her realize that she wanted to explore the diverse ways in which the entire human body can move.

Also an investigator at the McGovern Institute for Brain Research, Seethapathi pursued graduate research at The Ohio State University Movement Lab, where her goal was to identify the key elements of human locomotion. At that time, most people in the field were analyzing simple movements, she says, “but I was interested in broadening the scope of my models to include real-world behavior. Given that movement is so ubiquitous, I wondered: What can this model say about everyday life?”

After earning her PhD from Ohio State in 2018, Seethapathi continued this line of research as a postdoctoral fellow at the University of Pennsylvania. New computer vision tools to track human movement from video footage had just entered the scene, and during her time at UPenn, Seethapathi sought to expand her skillset to include computer vision and applications to movement rehabilitation.

At MIT, Seethapathi continues to extend the range of her studies of human movement, looking at how locomotion can evolve as people grow and age, and how it can adapt to anatomical changes and even adjust to shifts in weather, which can alter ground conditions. Her investigations now encompass other species as part of an effort to determine how creatures with different morphologies and habitats regulate their movements.

The models Seethapathi and her team create make predictions about human movements that can later be verified or refuted by empirical tests. While relatively simple experiments can be carried out on treadmills, her group is developing measurement systems incorporating wearable sensors and video-based sensing to measure movement data that have traditionally been hard to obtain outside the laboratory.

Although Seethapathi says she is primarily driven to uncover the fundamental principles that govern movement behavior, she believes her work also has practical applications.

“When people are treated for a movement disorder, the goal is to impact their movements in the real world,” she says. “We can use our predictive models to see how a particular intervention will affect a person’s trajectory. The hope is that our models can help put the individual on the right track to recovery as early as possible.”

What powerful new bots like ChatGPT tell us about intelligence and the human brain

This story originally appeared in the Spring 2023 issue of BrainScan.

___

Artificial intelligence seems to have gotten a lot smarter recently. AI technologies are increasingly integrated into our lives — improving our weather forecasts, finding efficient routes through traffic, personalizing the ads we see and our experiences with social media.

Watercolor image of a robot with a human brain, created using the AI system DALL*E2.

But with the debut of powerful new chatbots like ChatGPT, millions of people have begun interacting with AI tools that seem convincingly human-like. Neuroscientists are taking note — and beginning to dig into what these tools tell us about intelligence and the human brain.

The essence of human intelligence is hard to pin down, let alone engineer. McGovern scientists say there are many kinds of intelligence, and as humans, we call on many different kinds of knowledge and ways of thinking. ChatGPT’s ability to carry on natural conversations with its users has led some to speculate the computer model is sentient, but McGovern neuroscientists insist that the AI technology cannot think for itself.

Still, they say, the field may have reached a turning point.

“I still don’t believe that we can make something that is indistinguishable from a human. I think we’re a long way from that. But for the first time in my life I think there is a small, nonzero chance that it may happen in the next year,” says McGovern founding member Tomaso Poggio, who has studied both human intelligence and machine learning for more than 40 years.

Different sort of intelligence

Developed by the company OpenAI, ChatGPT is an example of a deep neural network, a type of machine learning system that has made its way into virtually every aspect of science and technology. These models learn to perform various tasks by identifying patterns in large datasets. ChatGPT works by scouring texts and detecting and replicating the ways language is used. Drawing on language patterns it finds across the internet, ChatGPT can design you a meal plan, teach you about rocket science, or write a high school-level essay about Mark Twain. With all of the internet as a training tool, models like this have gotten so good at what they do, they can seem all-knowing.

“Engineers have been inventing some of these forms of intelligence since the beginning of the computers. ChatGPT is one. But it is very far from human intelligence.” – Tomaso Poggio

Nonetheless, language models have a restricted skill set. Play with ChatGPT long enough and it will surely give you some wrong information, even if its fluency makes its words deceptively convincing. “These models don’t know about the world, they don’t know about other people’s mental states, they don’t know how things are beyond whatever they can gather from how words go together,” says Postdoctoral Associate Anna Ivanova, who works with McGovern Investigators Evelina Fedorenko and Nancy Kanwisher as well as Jacob Andreas in MIT’s Computer Science and Artificial Intelligence Laboratory.

Such a model, the researchers say, cannot replicate the complex information processing that happens in the human brain. That doesn’t mean language models can’t be intelligent — but theirs is a different sort of intelligence than our own. “I think that there is an infinite number of different forms of intelligence,” says Poggio. “Engineers have been inventing some of these forms of intelligence since the beginning of the computers. ChatGPT is one. But it is very far from human intelligence.”

Under the hood

Just as there are many forms of intelligence, there are also many types of deep learning models — and McGovern researchers are studying the internals of these models to better understand the human brain.

A watercolor painting of a robot generated by DALL*E2.

“These AI models are, in a way, computational hypotheses for what the brain is doing,” Kanwisher says. “Up until a few years ago, we didn’t really have complete computational models of what might be going on in language processing or vision. Once you have a way of generating actual precise models and testing them against real data, you’re kind of off and running in a way that we weren’t ten years ago.”

Artificial neural networks echo the design of the brain in that they are made of densely interconnected networks of simple units that organize themselves — but Poggio says it’s not yet entirely clear how they work.

No one expects that brains and machines will work in exactly the same ways, though some types of deep learning models are more humanlike in their internals than others. For example, a computer vision model developed by McGovern Investigator James DiCarlo responds to images in ways that closely parallel the activity in the visual cortex of animals who are seeing the same thing. DiCarlo’s team can even use their model’s predictions to create an image that will activate specific neurons in an animal’s brain.

“We shouldn’t just automatically assume that if we trained a deep network on a task, that it’s going to look like the brain.” – Ila Fiete

Still, there is reason to be cautious in interpreting what artificial neural networks tell us about biology. “We shouldn’t just automatically assume that if we trained a deep network on a task, that it’s going to look like the brain,” says McGovern Associate Investigator Ila Fiete. Fiete acknowledges that it’s tempting to think of neural networks as models of the brain itself due to their architectural similarities — but she says so far, that idea remains largely untested.

McGovern Institute Associate Investigator Ila Fiete builds theoretical models of the brain. Photo: Caitlin Cunningham

She and her colleagues recently experimented with neural networks that estimate an object’s position in space by integrating information about its changing velocity.

In the brain, specialized neurons known as grid cells carry out this calculation, keeping us aware of where we are as we move through the world. Other researchers had reported that not only can neural networks do this successfully, those that do include components that behave remarkably like grid cells. They had argued that the need to do this kind of path integration must be the reason our brains have grid cells — but Fiete’s team found that artificial networks don’t need to mimic the brain to accomplish this brain-like task. They found that many neural networks can solve the same problem without grid cell-like elements.

One way investigators might generate deep learning models that do work like the brain is to give them a problem that is so complex that there is only one way of solving it, Fiete says.

Language, she acknowledges, might be that complex.

“This is clearly an example of a super-rich task,” she says. “I think on that front, there is a hope that they’re solving such an incredibly difficult task that maybe there is a sense in which they mirror the brain.”

Language parallels

In Fedorenko’s lab, where researchers are focused on identifying and understanding the brain’s language processing circuitry, they have found that some language models do, in fact, mimic certain aspects of human language processing. Many of the most effective models are trained to do a single task: make predictions about word use. That’s what your phone is doing when it suggests words for your text message as you type. Models that are good at this, it turns out, can apply this skill to carrying on conversations, composing essays, and using language in other useful ways. Neuroscientists have found evidence that humans, too, rely on word prediction as a part of language processing.

Fedorenko and her team compared the activity of language models to the brain activity of people as they read or listened to words, sentences, and stories, and found that some models were a better match to human neural responses than others. “The models that do better on this relatively unsophisticated task — just guess what comes next — also do better at capturing human neural responses,” Fedorenko says.

A watercolor painting of a language model, generated by DALL*E2.

It’s a compelling parallel, suggesting computational models and the human brain may have arrived at a similar solution to a problem, even in the face of the biological constraints that have shaped the latter. For Fedorenko and her team, it’s sparked new ideas that they will explore, in part, by modifying existing language models — possibly to more closely mimic the brain.

With so much still unknown about how both human and artificial neural networks learn, Fedorenko says it’s hard to predict what it will take to make language models work and behave more like the human brain. One possibility they are exploring is training a model in a way that more closely mirrors the way children learn language early in life.

Another question, she says, is whether language models might behave more like humans if they had a more limited recall of their own conversations. “All of the state-of-the-art language models keep track of really, really long linguistic contexts. Humans don’t do that,” she says.

Chatbots can retain long strings of dialogue, using those words to tailor their responses as a conversation progresses, she explains. Humans, on the other hand, must cope with a more limited memory. While we can keep track of information as it is conveyed, we only store a string of about eight words as we listen or read. “We get linguistic input, we crunch it up, we extract some kind of meaning representation, presumably in some more abstract format, and then we discard the exact linguistic stream because we don’t need it anymore,” Fedorenko explains.

Language models aren’t able to fill in gaps in conversation with their own knowledge and awareness in the same way a person can, Ivanova adds. “That’s why so far they have to keep track of every single input word,” she says. “If we want a model that models specifically the [human] language network, we don’t need to have this large context window. It would be very cool to train those models on those short windows of context and see if it’s more similar to the language network.”

Multimodal intelligence

Despite these parallels, Fedorenko’s lab has also shown that there are plenty of things language circuits do not do. The brain calls on other circuits to solve math problems, write computer code, and carry out myriad other cognitive processes. Their work makes it clear that in the brain, language and thought are not the same.

That’s borne out by what cognitive neuroscientists like Kanwisher have learned about the functional organization of the human brain, where circuit components are dedicated to surprisingly specific tasks, from language processing to face recognition.

“The upshot of cognitive neuroscience over the last 25 years is that the human brain really has quite a degree of modular organization,” Kanwisher says. “You can look at the brain and say, ‘what does it tell us about the nature of intelligence?’ Well, intelligence is made up of a whole bunch of things.”

In generating this image from the text prompt, “a watercolor painting of a woman looking in a mirror and seeing a robot,” DALL*E2 incorrectly placed the woman (not the robot) in the mirror, highlighting one of the weaknesses of current deep learning models.

In January, Fedorenko, Kanwisher, Ivanova, and colleagues shared an extensive analysis of the capabilities of large language models. After assessing models’ performance on various language-related tasks, they found that despite their mastery of linguistic rules and patterns, such models don’t do a good job using language in real-world situations. From a neuroscience perspective, that kind of functional competence is distinct from formal language competence, calling on not just language-processing circuits but also parts of the brain that store knowledge of the world, reason, and interpret social interactions.

Language is a powerful tool for understanding the world, they say, but it has limits.

“If you train on language prediction alone, you can learn to mimic certain aspects of thinking,” Ivanova says. “But it’s not enough. You need a multimodal system to carry out truly intelligent behavior.”

The team concluded that while AI language models do a very good job using language, they are incomplete models of human thought. For machines to truly think like humans, Ivanova says, they will need a combination of different neural nets all working together, in the same way different networks in the human brain work together to achieve complex cognitive tasks in the real world.

It remains to be seen whether such models would excel in the tech world, but they could prove valuable for revealing insights into human cognition — perhaps in ways that will inform engineers as they strive to build systems that better replicate human intelligence.