Study decodes surprising approach mice take in learning

Neurotypical humans readily optimize performance in ‘reversal learning’ games, but while mice learn the winning strategy, they refuse to commit to it, a new study shows.

Illustration of mouse looking at a question mark and light bulb. — Neuroscientists studying learning found that when mice are confronted with a task in which the optimal strategy is to infer a stable rule (symbolized by the light bulb), mice learned to do so but never completely abandoned the alternate strategy, which was to openly explore as if the rule wasn't reliable (symbolized by the question mark). Image: Picower Institute Illustration using Adobe Stock Images

Neuroscience discoveries ranging from the nature of memory to treatments for disease have depended on reading the minds of mice, so researchers need to truly understand what the rodents’ behavior is telling them during experiments. In a new study that examines learning from reward, MIT researchers deciphered some initially mystifying mouse behavior, yielding new ideas about how mice think and a mathematical tool to aid future research.

The task the mice were supposed to master is simple: Turn a wheel left or right to get a reward and then recognize when the reward direction switches. When neurotypical people play such “reversal learning” games they quickly infer the optimal approach: stick with the direction that works until it doesn’t and then switch right away. Notably, people with schizophrenia struggle with the task. In the new study in PLOS Computational Biology, mice surprised scientists by showing that while they were capable of learning the “win-stay, lose-shift” strategy, they nonetheless refused to fully adopt it.

“It is not that mice cannot form an inference-based model of this environment—they can,” said corresponding author Mriganka Sur, Newton Professor in The Picower Institute for Learning and Memory and MIT’s Department of Brain and Cognitive Sciences (BCS). “The surprising thing is that they don’t persist with it. Even in a single block of the game where you know the reward is 100 percent on one side, every so often they will try the other side.”

While the mouse motif of departing from the optimal strategy could be due to a failure to hold it in memory, said lead author and Sur Lab graduate student Nhat Le, another possibility is that mice don’t commit to the “win-stay, lose-shift” approach because they don’t trust that their circumstances will remain stable or predictable. Instead, they might deviate from the optimal regime to test whether the rules have changed. Natural settings, after all, are rarely stable or predictable.

“I’d like to think mice are smarter than we give them credit for,” Le said.

But regardless of which reason may cause the mice to mix strategies, added co-senior author Mehrdad Jazayeri, Associate Professor in BCS and the McGovern Institute for Brain Research, it is important for researchers to recognize that they do and to be able to tell when and how they are choosing one strategy or another.

“This study highlights the fact that, unlike the accepted wisdom, mice doing lab tasks do not necessarily adopt a stationary strategy and it offers a computationally rigorous approach to detect and quantify such non-stationarities,” he said. “This ability is important because when researchers record the neural activity, their interpretation of the underlying algorithms and mechanisms may be invalid when they do not take the animals’ shifting strategies into account.”

Tracking thinking

The research team, which also includes co-author Murat Yildirim, a former Sur lab postdoc who is now an assistant professor at the Cleveland Clinic Lerner Research Institute, initially expected that the mice might adopt one strategy or the other. They simulated the results they’d expect to see if the mice either adopted the optimal strategy of inferring a rule about the task, or more randomly surveying whether left or right turns were being rewarded. Mouse behavior on the task, even after days, varied widely but it never resembled the results simulated by just one strategy.

To differing, individual extents, mouse performance on the task reflected variance along three parameters: how quickly they switched directions after the rule switched, how long it took them to transition to the new direction, and how loyal they remained to the new direction. Across 21 mice, the raw data represented a surprising diversity of outcomes on a task that neurotypical humans uniformly optimize. But the mice clearly weren’t helpless. Their average performance significantly improved over time, even though it plateaued below the optimal level.

In the task, the rewarded side switched every 15-25 turns. The team realized the mice were using more than one strategy in each such “block” of the game, rather than just inferring the simple rule and optimizing based on that inference. To disentangle when the mice were employing that strategy or another, the team harnessed an analytical framework called a Hidden Markov Model (HMM), which can computationally tease out when one unseen state is producing a result vs. another unseen state. Le likens it to what a judge on a cooking show might do: inferring which chef contestant made which version of a dish based on patterns in each plate of food before them.

Before the team could use an HMM to decipher their mouse performance results, however, they had to adapt it. A typical HMM might apply to individual mouse choices, but here the team modified it to explain choice transitions over the course of whole blocks. They dubbed their modified model the blockHMM. Computational simulations of task performance using the blockHMM showed that the algorithm is able to infer the true hidden states of an artificial agent. The authors then used this technique to show the mice were persistently blending multiple strategies, achieving varied levels of performance.

“We verified that each animal executes a mixture of behavior from multiple regimes instead of a behavior in a single domain,” Le and his co-authors wrote. “Indeed 17/21 mice used a combination of low, medium and high-performance behavior modes.”

Further analysis revealed that the strategies afoot were indeed the “correct” rule inference strategy and a more exploratory strategy consistent with randomly testing options to get turn-by-turn feedback.

Now that the researchers have decoded the peculiar approach mice take to reversal learning, they are planning to look more deeply into the brain to understand which brain regions and circuits are involved. By watching brain cell activity during the task, they hope to discern what underlies the decisions the mice make to switch strategies.

By examining reversal learning circuits in detail, Sur said, it’s possible the team will gain insights that could help explain why people with schizophrenia show diminished performance on reversal learning tasks. Sur added that some people with autism spectrum disorders also persist with newly unrewarded behaviors longer than neurotypical people, so his lab will also have that phenomenon in mind as they investigate.

Yildirim, too, is interested in examining potential clinical connections.

“This reversal learning paradigm fascinates me since I want to use it in my lab with various preclinical models of neurological disorders,” he said. “The next step for us is to determine the brain mechanisms underlying these differences in behavioral strategies and whether we can manipulate these strategies.”

Funding for the study came from The National Institutes of Health, the Army Research Office, a Paul and Lilah Newton Brain Science Research Award, the Massachusetts Life Sciences Initiative, The Picower Institute for Learning and Memory and The JPB Foundation.

Paper: "Mixtures of strategies underlie rodent behavior during reversal learning"