Understanding reality through algorithms

Although Fernanda De La Torre still has several years left in her graduate studies, she’s already dreaming big when it comes to what the future has in store for her.

“I dream of opening up a school one day where I could bring this world of understanding of cognition and perception into places that would never have contact with this,” she says.

It’s that kind of ambitious thinking that’s gotten De La Torre, a doctoral student in MIT’s Department of Brain and Cognitive Sciences, to this point. A recent recipient of the prestigious Paul and Daisy Soros Fellowship for New Americans, De La Torre has found at MIT a supportive, creative research environment that’s allowed her to delve into the cutting-edge science of artificial intelligence. But she’s still driven by an innate curiosity about human imagination and a desire to bring that knowledge to the communities in which she grew up.

An unconventional path to neuroscience

De La Torre’s first exposure to neuroscience wasn’t in the classroom, but in her daily life. As a child, she watched her younger sister struggle with epilepsy. At 12, she crossed into the United States from Mexico illegally to reunite with her mother, exposing her to a whole new language and culture. Once in the States, she had to grapple with her mother’s shifting personality in the midst of an abusive relationship. “All of these different things I was seeing around me drove me to want to better understand how psychology works,” De La Torre says, “to understand how the mind works, and how it is that we can all be in the same environment and feel very different things.”

But finding an outlet for that intellectual curiosity was challenging. As an undocumented immigrant, her access to financial aid was limited. Her high school was also underfunded and lacked elective options. Mentors along the way, though, encouraged the aspiring scientist, and through a program at her school, she was able to take community college courses to fulfill basic educational requirements.

It took an inspiring amount of dedication to her education, but De La Torre made it to Kansas State University for her undergraduate studies, where she majored in computer science and math. At Kansas State, she was able to get her first real taste of research. “I was just fascinated by the questions they were asking and this entire space I hadn’t encountered,” says De La Torre of her experience working in a visual cognition lab and discovering the field of computational neuroscience.

Although Kansas State didn’t have a dedicated neuroscience program, her research experience in cognition led her to a machine learning lab led by William Hsu, a computer science professor. There, De La Torre became enamored by the possibilities of using computation to model the human brain. Hsu’s support also convinced her that a scientific career was a possibility. “He always made me feel like I was capable of tackling big questions,” she says fondly.

With the confidence imparted in her at Kansas State, De La Torre came to MIT in 2019 as a post-baccalaureate student in the lab of Tomaso Poggio, the Eugene McDermott Professor of Brain and Cognitive Sciences and an investigator at the McGovern Institute for Brain Research. With Poggio, also the director of the Center for Brains, Minds and Machines, De La Torre began working on deep-learning theory, an area of machine learning focused on how artificial neural networks modeled on the brain can learn to recognize patterns and learn.

“It’s a very interesting question because we’re starting to use them everywhere,” says De La Torre of neural networks, listing off examples from self-driving cars to medicine. “But, at the same time, we don’t fully understand how these networks can go from knowing nothing and just being a bunch of numbers to outputting things that make sense.”

Her experience as a post-bac was De La Torre’s first real opportunity to apply the technical computer skills she developed as an undergraduate to neuroscience. It was also the first time she could fully focus on research. “That was the first time that I had access to health insurance and a stable salary. That was, in itself, sort of life-changing,” she says. “But on the research side, it was very intimidating at first. I was anxious, and I wasn’t sure that I belonged here.”

Fortunately, De La Torre says she was able to overcome those insecurities, both through a growing unabashed enthusiasm for the field and through the support of Poggio and her other colleagues in MIT’s Department of Brain and Cognitive Sciences. When the opportunity came to apply to the department’s PhD program, she jumped on it. “It was just knowing these kinds of mentors are here and that they cared about their students,” says De La Torre of her decision to stay on at MIT for graduate studies. “That was really meaningful.”

Expanding notions of reality and imagination

In her two years so far in the graduate program, De La Torre’s work has expanded the understanding of neural networks and their applications to the study of the human brain. Working with Guangyu Robert Yang, an associate investigator at the McGovern Institute and an assistant professor in the departments of Brain and Cognitive Sciences and Electrical Engineering and Computer Sciences, she’s engaged in what she describes as more philosophical questions about how one develops a sense of self as an independent being. She’s interested in how that self-consciousness develops and why it might be useful.

De La Torre’s primary advisor, though, is Professor Josh McDermott, who leads the Laboratory for Computational Audition. With McDermott, De La Torre is attempting to understand how the brain integrates vision and sound. While combining sensory inputs may seem like a basic process, there are many unanswered questions about how our brains combine multiple signals into a coherent impression, or percept, of the world. Many of the questions are raised by audiovisual illusions in which what we hear changes what we see. For example, if one sees a video of two discs passing each other, but the clip contains the sound of a collision, the brain will perceive that the discs are bouncing off, rather than passing through each other. Given an ambiguous image, that simple auditory cue is all it takes to create a different perception of reality.

There’s something interesting happening where our brains are receiving two signals telling us different things and, yet, we have to combine them somehow to make sense of the world.

De La Torre is using behavioral experiments to probe how the human brain makes sense of multisensory cues to construct a particular perception. To do so, she’s created various scenes of objects interacting in 3D space over different sounds, asking research participants to describe characteristics of the scene. For example, in one experiment, she combines visuals of a block moving across a surface at different speeds with various scraping sounds, asking participants to estimate how rough the surface is. Eventually she hopes to take the experiment into virtual reality, where participants will physically push blocks in response to how rough they perceive the surface to be, rather than just reporting on what they experience.

Once she’s collected data, she’ll move into the modeling phase of the research, evaluating whether multisensory neural networks perceive illusions the way humans do. “What we want to do is model exactly what’s happening,” says De La Torre. “How is it that we’re receiving these two signals, integrating them and, at the same time, using all of our prior knowledge and inferences of physics to really make sense of the world?”

Although her two strands of research with Yang and McDermott may seem distinct, she sees clear connections between the two. Both projects are about grasping what artificial neural networks are capable of and what they tell us about the brain. At a more fundamental level, she says that how the brain perceives the world from different sensory cues might be part of what gives people a sense of self. Sensory perception is about constructing a cohesive, unitary sense of the world from multiple sources of sensory data. Similarly, she argues, “the sense of self is really a combination of actions, plans, goals, emotions, all of these different things that are components of their own, but somehow create a unitary being.”

It’s a fitting sentiment for De La Torre, who has been working to make sense of and integrate different aspects of her own life. Working in the Computational Audition lab, for example, she’s started experimenting with combining electronic music with folk music from her native Mexico, connecting her “two worlds,” as she says. Having the space to undertake those kinds of intellectual explorations, and colleagues who encourage it, is one of De La Torre’s favorite parts of MIT.

“Beyond professors, there’s also a lot of students whose way of thinking just amazes me,” she says. “I see a lot of goodness and excitement for science and a little bit of — it’s not nerdiness, but a love for very niche things — and I just kind of love that.”

How do illusions trick the brain?

As part of our Ask the Brain series, Jarrod Hicks, a graduate student in Josh McDermott‘s lab and Dana Boebinger, a postdoctoral researcher at the University of Rochester (and former graduate student in Josh McDermott’s lab), answer the question, “How do illusions trick the brain?”

_____

Graduate student Jarrod Hicks studies how the brain processes sound. Photo: M.E. Megan Hicks

Imagine you’re a detective. Your job is to visit a crime scene, observe some evidence, and figure out what happened. However, there are often multiple stories that could have produced the evidence you observe. Thus, to solve the crime, you can’t just rely on the evidence in front of you – you have to use your knowledge about the world to make your best guess about the most likely sequence of events. For example, if you discover cat hair at the crime scene, your prior knowledge about the world tells you it’s unlikely that a cat is the culprit. Instead, a more likely explanation is that the culprit might have a pet cat.

Although it might not seem like it, this kind of detective work is what your brain is doing all the time. As your senses send information to your brain about the world around you, your brain plays the role of detective, piecing together each bit of information to figure out what is happening in the world. The information from your senses usually paints a pretty good picture of things, but sometimes when this information is incomplete or unclear, your brain is left to fill in the missing pieces with its best guess of what should be there. This means that what you experience isn’t actually what’s out there in the world, but rather what your brain thinks is out there. The consequence of this is that your perception of the world can depend on your experience and assumptions.

Optical illusions

Optical illusions are a great way of showing how our expectations and assumptions affect what we perceive. For example, look at the squares labeled “A” and “B” in the image below.

Checkershadow illusion. Image: Edward H. Adelson

Is one of them lighter than the other? Although most people would agree that the square labeled “B” is much lighter than the one labeled “A,” the two squares are actually the exact same color. You perceive the squares differently because your brain knows, from experience, that shadows tend to make things appear darker than what they actually are. So, despite the squares being physically identical, your brain thinks “B” should be lighter.

Auditory illusions

Tricks of perception are not limited to optical illusions. There are also several dramatic examples of how our expectations influence what we hear. For example, listen to the mystery sound below. What do you hear?

Mystery sound

Because you’ve probably never heard a sound quite like this before, your brain has very little idea about what to expect. So, although you clearly hear something, it might be very difficult to make out exactly what that something is. This mystery sound is something called sine-wave speech, and what you’re hearing is essentially a very degraded sound of someone speaking.

Now listen to a “clean” version of this speech in the audio clip below:

Clean speech

You probably hear a person saying, “the floor was quite slippery.” Now listen to the mystery sound above again. After listening to the original audio, your brain has a strong expectation about what you should hear when you listen to the mystery sound again. Even though you’re hearing the exact same mystery sound as before, you experience it completely differently. (Audio clips courtesy of University of Sussex).

 

Dana Boebinger describes the science of illusions in this McGovern Minute.

Subjective perceptions

These illusions have been specifically designed by scientists to fool your brain and reveal principles of perception. However, there are plenty of real-life situations in which your perceptions strongly depend on expectations and assumptions. For example, imagine you’re watching TV when someone begins to speak to you from another room. Because the noise from the TV makes it difficult to hear the person speaking, your brain might have to fill in the gaps to understand what’s being said. In this case, different expectations about what is being said could cause you to hear completely different things.

Which phrase do you hear?

Listen to the clip below to hear a repeating loop of speech. As the sound plays, try to listen for one of the phrases listed in teal below.

Because the audio is somewhat ambiguous, the phrase you perceive depends on which phrase you listen for. So even though it’s the exact same audio each time, you can perceive something totally different! (Note: the original audio recording is from a football game in which the fans were chanting, “that is embarrassing!”)

Illusions like the ones above are great reminders of how subjective our perceptions can be. In order to make sense of the messy information coming in from our senses, our brains are constantly trying to fill in the blanks and with its best guess of what’s out there. Because of this guesswork, our perceptions depend on our experiences, leading each of us to perceive and interact with the world in a way that’s uniquely ours.

Jarrod Hicks is a PhD candidate in the Department of Brain and Cognitive Sciences at MIT working with Josh McDermott in the Laboratory for Computational Audition. He studies sound segregation, a key aspect of real-world hearing in which a sound source of interest is estimated amid a mixture of competing sources. He is broadly interested in teaching/outreach, psychophysics, computational approaches to represent stimulus spaces, and neural coding of high-level sensory representations.

_____

Do you have a question for The Brain? Ask it here.

Three from MIT awarded 2022 Paul and Daisy Soros Fellowships for New Americans

MIT graduate student Fernanda De La Torre, alumna Trang Luu ’18, SM ’20, and senior Syamantak Payra are recipients of the 2022 Paul and Daisy Soros Fellowships for New Americans.

De La Torre, Luu, and Payra are among 30 New Americans selected from a pool of over 1,800 applicants. The fellowship honors the contributions of immigrants and children of immigrants by providing $90,000 in funding for graduate school.

Students interested in applying to the P.D. Soros Fellowship for future years may contact Kim Benard, associate dean of distinguished fellowships in Career Advising and Professional Development.

Fernanda De La Torre

Fernanda De La Torre is a PhD student in the Department of Brain and Cognitive Sciences. With Professor Josh McDermott, she studies how we integrate vision and sound, and with Professor Robert Yang, she develops computational models of imagination.

De La Torre spent her early childhood with her younger sister and grandmother in Guadalajara, Mexico. At age 12, she crossed the Mexican border to reunite with her mother in Kansas City, Missouri. Shortly after, an abusive home environment forced De La Torre to leave her family and support herself throughout her early teens.

Despite her difficult circumstances, De La Torre excelled academically in high school. By winning various scholarships that would discretely take applications from undocumented students, she was able to continue her studies in computer science and mathematics at Kansas State University. There, she became intrigued by the mysteries of the human mind. During college, De La Torre received invaluable mentorship from her former high school principal, Thomas Herrera, who helped her become documented through the Violence Against Women Act. Her college professor, William Hsu, supported her interests in artificial intelligence and encouraged her to pursue a scientific career.

After her undergraduate studies, De La Torre won a post-baccalaureate fellowship from the Department of Brain and Cognitive Sciences at MIT, where she worked with Professor Tomaso Poggio on the theory of deep learning. She then transitioned into the department’s PhD program. Beyond contributing to scientific knowledge, De La Torre plans to use science to create spaces where all people, including those from backgrounds like her own, can innovate and thrive.

She says: “Immigrants face many obstacles, but overcoming them gives us a unique strength: We learn to become resilient, while relying on friends and mentors. These experiences foster both the desire and the ability to pay it forward to our community.”

Trang Luu

Trang Luu graduated from MIT with a BS in mechanical engineering in 2018, and a master of engineering degree in 2020. Her Soros award will support her graduate studies at Harvard University in the MBA/MS engineering sciences program.

Born in Saigon, Vietnam, Luu was 3 when her family immigrated to Houston, Texas. Watching her parents’ efforts to make a living in a land where they did not understand the culture or speak the language well, Luu wanted to alleviate hardship for her family. She took full responsibility for her education and found mentors to help her navigate the American education system. At home, she assisted her family in making and repairing household items, which fueled her excitement for engineering.

As an MIT undergraduate, Luu focused on assistive technology projects, applying her engineering background to solve problems impeding daily living. These projects included a new adaptive socket liner for below-the-knee amputees in Kenya, Ethiopia, and Thailand; a walking stick adapter for wheelchairs; a computer head pointer for patients with limited arm mobility, a safer makeshift cook stove design for street vendors in South Africa; and a quicker method to test new drip irrigation designs. As a graduate student in MIT D-Lab under the direction of Professor Daniel Frey, Luu was awarded a National Science Foundation Graduate Research Fellowship. In her graduate studies, Luu researched methods to improve evaporative cooling devices for off-grid farmers to reduce rapid fruit and vegetable deterioration.

These projects strengthened Luu’s commitment to innovating new technology and devices for people struggling with basic daily tasks. During her senior year, Luu collaborated on developing a working prototype of a wearable device that noninvasively reduces hand tremors associated with Parkinson’s disease or essential tremor. Observing patients’ joy after their tremors stopped compelled Luu and three co-founders to continue developing the device after college. Four years later, Encora Therapeutics has accomplished major milestones, including Breakthrough Device designation by the U.S. Food and Drug Administration.

Syamantak Payra

Hailing from Houston, Texas, Syamantak Payra is a senior majoring in electrical engineering and computer science, with minors in public policy and entrepreneurship and innovation. He will be pursuing a PhD in engineering at Stanford University, with the goal of creating new biomedical devices that can help improve daily life for patients worldwide and enhance health care outcomes for decades to come.

Payra’s parents had emigrated from India, and he grew up immersed in his grandparents’ rich Bengali culture. As a high school student, he conducted projects with NASA engineers at Johnson Space Center, experimented at home with his scientist parents, and competed in spelling bees and science fairs across the United States. Through these avenues and activities, Syamantak not only gained perspectives on bridging gaps between people, but also found passions for language, scientific discovery, and teaching others.

After watching his grandmother struggle with asthma and chronic obstructive pulmonary disease and losing his baby brother to brain cancer, Payra devoted himself to trying to use technology to solve health-care challenges. Payra’s proudest accomplishments include building a robotic leg brace for his paralyzed teacher and conducting free literacy workshops and STEM outreach programs that reached nearly a thousand underprivileged students across the Greater Houston Area.

At MIT, Payra has worked in Professor Yoel Fink’s research laboratory, creating digital sensor fibers that have been woven into intelligent garments that can assist in diagnosing illnesses, and in Professor Joseph Paradiso’s research laboratory, where he contributed to next-generation spacesuit prototypes that better protect astronauts on spacewalks. Payra’s research has been published by multiple scientific journals, and he was inducted into the National Gallery of America’s Young Inventors.

Where did that sound come from?

The human brain is finely tuned not only to recognize particular sounds, but also to determine which direction they came from. By comparing differences in sounds that reach the right and left ear, the brain can estimate the location of a barking dog, wailing fire engine, or approaching car.

MIT neuroscientists have now developed a computer model that can also perform that complex task. The model, which consists of several convolutional neural networks, not only performs the task as well as humans do, it also struggles in the same ways that humans do.

“We now have a model that can actually localize sounds in the real world,” says Josh McDermott, an associate professor of brain and cognitive sciences and a member of MIT’s McGovern Institute for Brain Research. “And when we treated the model like a human experimental participant and simulated this large set of experiments that people had tested humans on in the past, what we found over and over again is it the model recapitulates the results that you see in humans.”

Findings from the new study also suggest that humans’ ability to perceive location is adapted to the specific challenges of our environment, says McDermott, who is also a member of MIT’s Center for Brains, Minds, and Machines.

McDermott is the senior author of the paper, which appears today in Nature Human Behavior. The paper’s lead author is MIT graduate student Andrew Francl.

Modeling localization

When we hear a sound such as a train whistle, the sound waves reach our right and left ears at slightly different times and intensities, depending on what direction the sound is coming from. Parts of the midbrain are specialized to compare these slight differences to help estimate what direction the sound came from, a task also known as localization.

This task becomes markedly more difficult under real-world conditions — where the environment produces echoes and many sounds are heard at once.

Scientists have long sought to build computer models that can perform the same kind of calculations that the brain uses to localize sounds. These models sometimes work well in idealized settings with no background noise, but never in real-world environments, with their noises and echoes.

To develop a more sophisticated model of localization, the MIT team turned to convolutional neural networks. This kind of computer modeling has been used extensively to model the human visual system, and more recently, McDermott and other scientists have begun applying it to audition as well.

Convolutional neural networks can be designed with many different architectures, so to help them find the ones that would work best for localization, the MIT team used a supercomputer that allowed them to train and test about 1,500 different models. That search identified 10 that seemed the best-suited for localization, which the researchers further trained and used for all of their subsequent studies.

To train the models, the researchers created a virtual world in which they can control the size of the room and the reflection properties of the walls of the room. All of the sounds fed to the models originated from somewhere in one of these virtual rooms. The set of more than 400 training sounds included human voices, animal sounds, machine sounds such as car engines, and natural sounds such as thunder.

The researchers also ensured the model started with the same information provided by human ears. The outer ear, or pinna, has many folds that reflect sound, altering the frequencies that enter the ear, and these reflections vary depending on where the sound comes from. The researchers simulated this effect by running each sound through a specialized mathematical function before it went into the computer model.

“This allows us to give the model the same kind of information that a person would have,” Francl says.

After training the models, the researchers tested them in a real-world environment. They placed a mannequin with microphones in its ears in an actual room and played sounds from different directions, then fed those recordings into the models. The models performed very similarly to humans when asked to localize these sounds.

“Although the model was trained in a virtual world, when we evaluated it, it could localize sounds in the real world,” Francl says.

Similar patterns

The researchers then subjected the models to a series of tests that scientists have used in the past to study humans’ localization abilities.

In addition to analyzing the difference in arrival time at the right and left ears, the human brain also bases its location judgments on differences in the intensity of sound that reaches each ear. Previous studies have shown that the success of both of these strategies varies depending on the frequency of the incoming sound. In the new study, the MIT team found that the models showed this same pattern of sensitivity to frequency.

“The model seems to use timing and level differences between the two ears in the same way that people do, in a way that’s frequency-dependent,” McDermott says.

The researchers also showed that when they made localization tasks more difficult, by adding multiple sound sources played at the same time, the computer models’ performance declined in a way that closely mimicked human failure patterns under the same circumstances.

“As you add more and more sources, you get a specific pattern of decline in humans’ ability to accurately judge the number of sources present, and their ability to localize those sources,” Francl says. “Humans seem to be limited to localizing about three sources at once, and when we ran the same test on the model, we saw a really similar pattern of behavior.”

Because the researchers used a virtual world to train their models, they were also able to explore what happens when their model learned to localize in different types of unnatural conditions. The researchers trained one set of models in a virtual world with no echoes, and another in a world where there was never more than one sound heard at a time. In a third, the models were only exposed to sounds with narrow frequency ranges, instead of naturally occurring sounds.

When the models trained in these unnatural worlds were evaluated on the same battery of behavioral tests, the models deviated from human behavior, and the ways in which they failed varied depending on the type of environment they had been trained in. These results support the idea that the localization abilities of the human brain are adapted to the environments in which humans evolved, the researchers say.

The researchers are now applying this type of modeling to other aspects of audition, such as pitch perception and speech recognition, and believe it could also be used to understand other cognitive phenomena, such as the limits on what a person can pay attention to or remember, McDermott says.

The research was funded by the National Science Foundation and the National Institute on Deafness and Other Communication Disorders.

Perfecting pitch perception

New research from MIT neuroscientists suggest that natural soundscapes have shaped our sense of hearing, optimizing it for the kinds of sounds we most often encounter.

Mark Saddler, graduate fellow of the K. Lisa Yang Integrative Computational Neuroscience Center. Photo: Caitlin Cunningham

In a study reported December 14 in the journal Nature Communications, researchers led by McGovern Institute Associate Investigator Josh McDermott used computational modeling to explore factors that influence how humans hear pitch. Their model’s pitch perception closely resembled that of humans—but only when it was trained using music, voices, or other naturalistic sounds.

Humans’ ability to recognize pitch—essentially, the rate at which a sound repeats—gives melody to music and nuance to spoken language. Although this is arguably the best-studied aspect of human hearing, researchers are still debating which factors determine the properties of pitch perception, and why it is more acute for some types of sounds than others. McDermott, who is also an associate professor in MIT’s Department of Brain and Cognitive Sciences and an investigator with the Center for Brains Minds and Machines (CBMM), is particularly interested in understanding how our nervous system perceives pitch because cochlear implants, which send electrical signals about sound to the brain in people with profound deafness, don’t replicate this aspect of human hearing very well.

“Cochlear implants can do a pretty good job of helping people understand speech, especially if they’re in a quiet environment. But they really don’t reproduce the percept of pitch very well,” says Mark Saddler, a CBMM graduate student who co-led the project and an inaugural graduate fellow of the K. Lisa Yang Integrative Computational Neuroscience Center. “One of the reasons it’s important to understand the detailed basis of pitch perception in people with normal hearing is to try to get better insights into how we would reproduce that artificially in a prosthesis.”

Artificial hearing

Pitch perception begins in the cochlea, the snail-shaped structure in the inner ear where vibrations from sounds are transformed into electrical signals and relayed to the brain via the auditory nerve. The cochlea’s structure and function help determine how and what we hear. And although it hasn’t been possible to test this idea experimentally, McDermott’s team suspected our “auditory diet” might shape our hearing as well.

To explore how both our ears and our environment influence pitch perception, McDermott, Saddler and research assistant Ray Gonzalez built a computer model called a deep neural network. Neural networks are a type of machine learning model widely used in automatic speech recognition and other artificial intelligence applications. Although the structure of an artificial neural network coarsely resembles the connectivity of neurons in the brain, the models used in engineering applications don’t actually hear the same way humans do—so the team developed a new model to reproduce human pitch perception. Their approach combined an artificial neural network with an existing model of the mammalian ear, uniting the power of machine learning with insights from biology. “These new machine learning models are really the first that can be trained to do complex auditory tasks and actually do them well, at human levels of performance,” Saddler explains.

The researchers trained the neural network to estimate pitch by asking it to identify the repetition rate of sounds in a training set. This gave them the flexibility to change the parameters under which pitch perception developed. They could manipulate the types of sound they presented to the model, as well as the properties of the ear that processed those sounds before passing them on to the neural network.

When the model was trained using sounds that are important to humans, like speech and music, it learned to estimate pitch much as humans do. “We very nicely replicated many characteristics of human perception…suggesting that it’s using similar cues from the sounds and the cochlear representation to do the task,” Saddler says.

But when the model was trained using more artificial sounds or in the absence of any background noise, its behavior was very different. For example, Saddler says, “If you optimize for this idealized world where there’s never any competing sources of noise, you can learn a pitch strategy that seems to be very different from that of humans, which suggests that perhaps the human pitch system was really optimized to deal with cases where sometimes noise is obscuring parts of the sound.”

The team also found the timing of nerve signals initiated in the cochlea to be critical to pitch perception. In a healthy cochlea, McDermott explains, nerve cells fire precisely in time with the sound vibrations that reach the inner ear. When the researchers skewed this relationship in their model, so that the timing of nerve signals was less tightly correlated to vibrations produced by incoming sounds, pitch perception deviated from normal human hearing. 

McDermott says it will be important to take this into account as researchers work to develop better cochlear implants. “It does very much suggest that for cochlear implants to produce normal pitch perception, there needs to be a way to reproduce the fine-grained timing information in the auditory nerve,” he says. “Right now, they don’t do that, and there are technical challenges to making that happen—but the modeling results really pretty clearly suggest that’s what you’ve got to do.”

Data transformed

With the tools of modern neuroscience, data accumulates quickly. Recording devices listen in on the electrical conversations between neurons, picking up the voices of hundreds of cells at a time. Microscopes zoom in to illuminate the brain’s circuitry, capturing thousands of images of cells’ elaborately branched paths. Functional MRIs detect changes in blood flow to map activity within a person’s brain, generating a complete picture by compiling hundreds of scans.

“When I entered neuroscience about 20 years ago, data were extremely precious, and ideas, as the expression went, were cheap. That’s no longer true,” says McGovern Associate Investigator Ila Fiete. “We have an embarrassment of wealth in the data but lack sufficient conceptual and mathematical scaffolds to understand it.”

Fiete will lead the McGovern Institute’s new K. Lisa Yang Integrative Computational Neuroscience (ICoN) Center, whose scientists will create mathematical models and other computational tools to confront the current deluge of data and advance our understanding of the brain and mental health. The center, funded by a $24 million donation from philanthropist Lisa Yang, will take a uniquely collaborative approach to computational neuroscience, integrating data from MIT labs to explain brain function at every level, from the molecular to the behavioral.

“Driven by technologies that generate massive amounts of data, we are entering a new era of translational neuroscience research,” says Yang, whose philanthropic investment in MIT research now exceeds $130 million. “I am confident that the multidisciplinary expertise convened by this center will revolutionize how we synthesize this data and ultimately understand the brain in health and disease.”

Data integration

Fiete says computation is particularly crucial to neuroscience because the brain is so staggeringly complex. Its billions of neurons, which are themselves complicated and diverse, interact with one other through trillions of connections.

“Conceptually, it’s clear that all these interactions are going to lead to pretty complex things. And these are not going to be things that we can explain in stories that we tell,” Fiete says. “We really will need mathematical models. They will allow us to ask about what changes when we perturb one or several components — greatly accelerating the rate of discovery relative to doing those experiments in real brains.”

By representing the interactions between the components of a neural circuit, a model gives researchers the power to explore those interactions, manipulate them, and predict the circuit’s behavior under different conditions.

“You can observe these neurons in the same way that you would observe real neurons. But you can do even more, because you have access to all the neurons and you have access to all the connections and everything in the network,” explains computational neuroscientist and McGovern Associate Investigator Guangyu Robert Yang (no relation to Lisa Yang), who joined MIT as a junior faculty member in July 2021.

Many neuroscience models represent specific functions or parts of the brain. But with advances in computation and machine learning, along with the widespread availability of experimental data with which to test and refine models, “there’s no reason that we should be limited to that,” he says.

Robert Yang’s team at the McGovern Institute is working to develop models that integrate multiple brain areas and functions. “The brain is not just about vision, just about cognition, just about motor control,” he says. “It’s about all of these things. And all these areas, they talk to one another.” Likewise, he notes, it’s impossible to separate the molecules in the brain from their effects on behavior – although those aspects of neuroscience have traditionally been studied independently, by researchers with vastly different expertise.

The ICoN Center will eliminate the divides, bringing together neuroscientists and software engineers to deal with all types of data about the brain. To foster interdisciplinary collaboration, every postdoctoral fellow and engineer at the center will work with multiple faculty mentors. Working in three closely interacting scientific cores, fellows will develop computational technologies for analyzing molecular data, neural circuits, and behavior, such as tools to identify pat-terns in neural recordings or automate the analysis of human behavior to aid psychiatric diagnoses. These technologies will also help researchers model neural circuits, ultimately transforming data into knowledge and understanding.

“Lisa is focused on helping the scientific community realize its goals in translational research,” says Nergis Mavalvala, dean of the School of Science and the Curtis and Kathleen Marble Professor of Astrophysics. “With her generous support, we can accelerate the pace of research by connecting the data to the delivery of tangible results.”

Computational modeling

In its first five years, the ICoN Center will prioritize four areas of investigation: episodic memory and exploration, including functions like navigation and spatial memory; complex or stereotypical behavior, such as the perseverative behaviors associated with autism and obsessive-compulsive disorder; cognition and attention; and sleep. The goal, Fiete says, is to model the neuronal interactions that underlie these functions so that researchers can predict what will happen when something changes — when certain neurons become more active or when a genetic mutation is introduced, for example. When paired with experimental data from MIT labs, the center’s models will help explain not just how these circuits work, but also how they are altered by genes, the environment, aging, and disease.

These focus areas encompass circuits and behaviors often affected by psychiatric disorders and neurodegeneration, and models will give researchers new opportunities to explore their origins and potential treatment strategies. “I really think that the future of treating disorders of the mind is going to run through computational modeling,” says McGovern Associate Investigator Josh McDermott.

In McDermott’s lab, researchers are modeling the brain’s auditory circuits. “If we had a perfect model of the auditory system, we would be able to understand why when somebody loses their hearing, auditory abilities degrade in the very particular ways in which they degrade,” he says. Then, he says, that model could be used to optimize hearing aids by predicting how the brain would interpret sound altered in various ways by the device.

Similar opportunities will arise as researchers model other brain systems, McDermott says, noting that computational models help researchers grapple with a dauntingly vast realm of possibilities. “There’s lots of different ways the brain can be set up, and lots of different potential treatments, but there is a limit to the number of neuroscience or behavioral experiments you can run,” he says. “Doing experiments on a computational system is cheap, so you can explore the dynamics of the system in a very thorough way.”

The ICoN Center will speed the development of the computational tools that neuroscientists need, both for basic understanding of the brain and clinical advances. But Fiete hopes for a culture shift within neuroscience, as well. “There are a lot of brilliant students and postdocs who have skills that are mathematics and computational and modeling based,” she says. “I think once they know that there are these possibilities to collaborate to solve problems related to psychiatric disorders and how we think, they will see that this is an exciting place to apply their skills, and we can bring them in.”

Josh McDermott seeks to replicate the human auditory system

The human auditory system is a marvel of biology. It can follow a conversation in a noisy restaurant, learn to recognize words from languages we’ve never heard before, and identify a familiar colleague by their footsteps as they walk by our office.

So far, even the most sophisticated computational models cannot perform such tasks as well as the human auditory system, but MIT neuroscientist Josh McDermott hopes to change that. Achieving this goal would be a major step toward developing new ways to help people with hearing loss, says McDermott, who recently earned tenure in MIT’s Department of Brain and Cognitive Sciences.

“Our long-term goal is to build good predictive models of the auditory system,” McDermott says.

“If we were successful in that goal, then it would really transform our ability to make people hear better, because we could design a computer program to figure out what to do to incoming sound to make it easier to recognize what somebody said or where a sound is coming from.”

McDermott’s lab also explores how exposure to different types of music affects people’s music preferences and even how they perceive music. Such studies can help to reveal elements of sound perception that are “hardwired” into our brains, and other elements that are influenced by exposure to different kinds of sounds.

“We have found that there is cross-cultural variation in things that people had widely supposed were universal and possibly even innate,” McDermott says.

Sound perception

As an undergraduate at Harvard University, McDermott originally planned to study math and physics, but “I was very quickly seduced by the brain,” he says. At the time, Harvard did not offer a major in neuroscience, so McDermott created his own, with a focus on vision.

After earning a master’s degree from University College London, he came to MIT to do a PhD in brain and cognitive sciences. His focus was still on vision, which he studied with Ted Adelson, the John and Dorothy Wilson Professor of Vision Science, but he found himself increasingly interested in audition. He had always loved music, and around this time, he started working as a radio and club DJ. “I was spending a lot of time thinking about sound and why things sound the way they do,” he recalls.

To pursue his new interest, he served as a postdoc at the University of Minnesota, where he worked in a lab devoted to psychoacoustics — the study of how humans perceive sound. There, he studied auditory phenomena such as the “cocktail party effect,” or the ability to focus on a particular person’s voice while tuning out background noise. During another postdoc at New York University, he started working on computational models of the auditory system. That interest in computation is part of what drew him back to MIT as a faculty member, in 2013.

“The culture here surrounding brain and cognitive science really prioritizes and values computation, and that was a perspective that was important to me,” says McDermott, who is also a member of MIT’s McGovern Institute for Brain Research and the Center for Brains, Minds and Machines. “I knew that was the kind of work I really wanted to do in my lab, so it just felt like a natural environment for doing that work.”

One aspect of audition that McDermott’s lab focuses on is “auditory scene analysis,” which includes tasks such as inferring what events in the environment caused a particular sound, and determining where a particular sound came from. This requires the ability to disentangle sounds produced by different events or objects, and the ability to tease out the effects of the environment. For instance, a basketball bouncing on a hardwood floor in a gym makes a different sound than a basketball bouncing on an outdoor paved court.

“Sounds in the world have very particular properties, due to physics and the way that the world works,” McDermott says. “We believe that the brain internalizes those regularities, and you have models in your head of the way that sound is generated. When you hear something, you are performing an inference in that model to figure out what is likely to have happened that caused the sound.”

A better understanding of how the brain does this may eventually lead to new strategies to enhance human hearing, McDermott says.

“Hearing impairment is the most common sensory disorder. It affects almost everybody as they get older, and the treatments are OK, but they’re not great,” he says. “We’re eventually going to all have personalized hearing aids that we walk around with, and we just need to develop the right algorithms in order to tell them what to do. That’s something we’re actively working on.”

Music in the brain

About 10 years ago, when McDermott was a postdoc, he started working on cross-cultural studies of how the human brain perceives music. Richard Godoy, an anthropologist at Brandeis University, asked McDermott to join him for some studies of the Tsimane’ people, who live in the Amazon rainforest. Since then, McDermott and some of his students have gone to Bolivia most summers to study sound perception among the Tsimane’. The Tsimane’ have had very little exposure to Western music, making them ideal subjects to study how listening to certain kinds of music influences human sound perception.

These studies have revealed both differences and similarities between Westerners and the Tsimane’ people. McDermott, who counts soul, disco, and jazz-funk among his favorite types of music, has found that Westerners and the Tsimane’ differ in their perceptions of dissonance. To Western ears, for example, the chord of C and F# sounds very unpleasant, but not to the Tsimane’.

He has also shown that that people in Western society perceive sounds that are separated by an octave to be similar, but the Tsimane’ do not. However, there are also some similarities between the two groups. For example, the upper limit of frequencies that can be perceived appears to be the same regardless of music exposure.

“We’re finding both striking variation in some perceptual traits that many people presumed were common across cultures and listeners, and striking similarities in others,” McDermott says. “The similarities and differences across cultures dissociate aspects of perception that are tightly coupled in Westerners, helping us to parcellate perceptual systems into their underlying components.”

Nine MIT School of Science professors receive tenure for 2020

Beginning July 1, nine faculty members in the MIT School of Science have been granted tenure by MIT. They are appointed in the departments of Brain and Cognitive Sciences, Chemistry, Mathematics, and Physics.

Physicist Ibrahim Cisse investigates living cells to reveal and study collective behaviors and biomolecular phase transitions at the resolution of single molecules. The results of his work help determine how disruptions in genes can cause diseases like cancer. Cisse joined the Department of Physics in 2014 and now holds a joint appointment with the Department of Biology. His education includes a bachelor’s degree in physics from North Carolina Central University, concluded in 2004, and a doctoral degree in physics from the University of Illinois at Urbana-Champaign, achieved in 2009. He followed his PhD with a postdoc at the École Normale Supérieure of Paris and a research specialist appointment at the Howard Hughes Medical Institute’s Janelia Research Campus.

Jörn Dunkel is a physical applied mathematician. His research focuses on the mathematical description of complex nonlinear phenomena in a variety of fields, especially biophysics. The models he develops help predict dynamical behaviors and structure formation processes in developmental biology, fluid dynamics, and even knot strengths for sailing, rock climbing and construction. He joined the Department of Mathematics in 2013 after completing postdoctoral appointments at Oxford University and Cambridge University. He received diplomas in physics and mathematics from Humboldt University of Berlin in 2004 and 2005, respectively. The University of Augsburg awarded Dunkel a PhD in statistical physics in 2008.

A cognitive neuroscientist, Mehrdad Jazayeri studies the neurobiological underpinnings of mental functions such as planning, inference, and learning by analyzing brain signals in the lab and using theoretical and computational models, including artificial neural networks. He joined the Department of Brain and Cognitive Sciences in 2013. He achieved a BS in electrical engineering from the Sharif University of Technology in 1994, an MS in physiology at the University of Toronto in 2001, and a PhD in neuroscience from New York University in 2007. Prior to joining MIT, he was a postdoc at the University of Washington. Jazayeri is also an investigator at the McGovern Institute for Brain Research.

Yen-Jie Lee is an experimental particle physicist in the field of proton-proton and heavy-ion physics. Utilizing the Large Hadron Colliders, Lee explores matter in extreme conditions, providing new insight into strong interactions and what might have existed and occurred at the beginning of the universe and in distant star cores. His work on jets and heavy flavor particle production in nuclei collisions improves understanding of the quark-gluon plasma, predicted by quantum chromodynamics (QCD) calculations, and the structure of heavy nuclei. He also pioneered studies of high-density QCD with electron-position annihilation data. Lee joined the Department of Physics in 2013 after a fellowship at CERN and postdoc research at the Laboratory for Nuclear Science at MIT. His bachelor’s and master’s degrees were awarded by the National Taiwan University in 2002 and 2004, respectively, and his doctoral degree by MIT in 2011. Lee is a member of the Laboratory for Nuclear Science.

Josh McDermott investigates the sense of hearing. His research addresses both human and machine audition using tools from experimental psychology, engineering, and neuroscience. McDermott hopes to better understand the neural computation underlying human hearing, to improve devices to assist hearing impaired, and to enhance machine interpretation of sounds. Prior to joining MIT’s Department of Brain and Cognitive Sciences, he was awarded a BA in 1998 in brain and cognitive sciences by Harvard University, a master’s degree in computational neuroscience in 2000 by University College London, and a PhD in brain and cognitive sciences in 2006 by MIT. Between his doctoral time at MIT and returning as a faculty member, he was a postdoc at the University of Minnesota and New York University, and a visiting scientist at Oxford University. McDermott is also an associate investigator at the McGovern Institute for Brain Research and an investigator in the Center for Brains, Minds and Machines.

Solving environmental challenges by studying and manipulating chemical reactions is the focus of Yogesh Surendranath’s research. Using chemistry, he works at the molecular level to understand how to efficiently interconvert chemical and electrical energy. His fundamental studies aim to improve energy storage technologies, such as batteries, fuel cells, and electrolyzers, that can be used to meet future energy demand with reduced carbon emissions. Surendranath joined the Department of Chemistry in 2013 after a postdoc at the University of California at Berkeley. His PhD was completed in 2011 at MIT, and BS in 2006 at the University of Virginia. Suendranath is also a collaborator in the MIT Energy Initiative.

A theoretical astrophysicist, Mark Vogelsberger is interested in large-scale structures of the universe, such as galaxy formation. He combines observational data, theoretical models, and simulations that require high-performance supercomputers to improve and develop detailed models that simulate galaxy diversity, clustering, and their properties, including a plethora of physical effects like magnetic fields, cosmic dust, and thermal conduction. Vogelsberger also uses simulations to generate scenarios involving alternative forms of dark matter. He joined the Department of Physics in 2014 after a postdoc at the Harvard-Smithsonian Center for Astrophysics. Vogelsberger is a 2006 graduate of the University of Mainz undergraduate program in physics, and a 2010 doctoral graduate of the University of Munich and the Max Plank Institute for Astrophysics. He is also a principal investigator in the MIT Kavli Institute for Astrophysics and Space Research.

Adam Willard is a theoretical chemist with research interests that fall across molecular biology, renewable energy, and material science. He uses theory, modeling, and molecular simulation to study the disorder that is inherent to systems over nanometer-length scales. His recent work has highlighted the fundamental and unexpected role that such disorder plays in phenomena such as microscopic energy transport in semiconducting plastics, ion transport in batteries, and protein hydration. Joining the Department of Chemistry in 2013, Willard was formerly a postdoc at Lawrence Berkeley National Laboratory and then the University of Texas at Austin. He holds a PhD in chemistry from the University of California at Berkeley, achieved in 2009, and a BS in chemistry and mathematics from the University of Puget Sound, granted in 2003.

Lindley Winslow seeks to understand the fundamental particles shaped the evolution of our universe. As an experimental particle and nuclear physicist, she develops novel detection technology to search for axion dark matter and a proposed nuclear decay that makes more matter than antimatter. She started her faculty position in the Department of Physics in 2015 following a postdoc at MIT and a subsequent faculty position at the University of California at Los Angeles. Winslow achieved her BA in physics and astronomy in 2001 and PhD in physics in 2008, both at the University of California at Berkeley. She is also a member of the Laboratory for Nuclear Science.

Universal musical harmony

Many forms of Western music make use of harmony, or the sound created by certain pairs of notes. A longstanding question is why some combinations of notes are perceived as pleasant while others sound jarring to the ear. Are the combinations we favor a universal phenomenon? Or are they specific to Western culture?

Through intrepid research trips to the remote Bolivian rainforest, the McDermott lab at the McGovern Institute has found that aspects of the perception of note combinations may be universal, even though the aesthetic evaluation of note combination as pleasant or unpleasant is culture-specific.

“Our work has suggested some universal features of perception that may shape musical behavior around the world,” says McGovern Associate Investigator Josh McDermott, senior author of the Nature Communications study. “But it also indicates the rich interplay with cultural influences that give rise to the experience of music.”

Remote learning

Questions about the universality of musical perception are difficult to answer, in part because of the challenge in finding people with little exposure to Western music. McDermott, who is also an associate professor in MIT’s Department of Brain and Cognitive Sciences and an investigator in the Center for Brains Minds and Machines, has found a way to address this problem. His lab has performed a series of studies with the participation of an indigenous population, the Tsimane’, who live in relative isolation from Western culture and have had little exposure to Western music. Accessing the Tsimane’ villages is challenging, as they are scattered throughout the rainforest and only reachable during the dry part of the year.

Left to right Josh McDermott (in vehicle), Alex Durango, Sophie Dolan and Malinda McPherson experiencing a travel delay en route to a Tsimane’ village after a heavy rainfall. Photo: Malinda McPherson

“When we enter a village there is always a crowd of curious children to greet us,” says Malinda McPherson, a graduate student in the lab and lead author of the study. “Tsimane’ are friendly and welcoming, and we have visited some villages several times, so now many people recognize us.”

In a study published in 2019, McDermott’s team found evidence that the brain’s ability to detect musical octaves is not universal, but is gained through cultural experience. And in 2016 they published findings suggesting that the preference for consonance over dissonance is culture-specific. In their new study, the team decided to explore whether aspects of the perception of consonance and dissonance might nonetheless be universally present across cultures.

Music lessons

In Western music, harmony is the sound of two or more notes heard simultaneously. Think of the Leonard Cohen song, Hallelujah, where he sings about harmony (“the fourth, the fifth, the minor fall and the major lift”). A combination of two notes is called an interval, and intervals that are perceived to be the most pleasant (or consonant, like the fourth and the fifth, for example) to the Western ear are generally represented by smaller integer ratios.

Intervals that are related by low integer ratios have fascinated scientists for centuries.

“Such intervals are central to Western music, but are also believed to be a common feature of many musical systems around the world,” McPherson explains. “So intervals are a natural target for cross-cultural research, which can help identify aspects of perception that are and aren’t independent of cultural experience.”

Scientists have been drawn to low integer ratios in music in part because they relate to the frequencies in voices and many instruments, known as ‘overtones’. Overtones from sounds like voices form a particular pattern known as the harmonic series. As it happens, the combination of two concurrent notes related by a low integer ratio partially reproduces this pattern. Because the brain presumably evolved to represent natural sounds, such as voices, it has seemed plausible that intervals with low integer ratios might have special perceptual status across cultures.

Since the Tsimane’ do not generally sing or play music together, meaning they have not been trained to hear or sing in harmony, McPherson and her colleagues were presented with a unique opportunity to explore whether there is anything universal about the perception of musical intervals.

Taking notes

In order to probe the perception of musical intervals, McDermott and colleagues took advantage of the fact that ears accustomed to Western musical harmony often have difficulty picking apart two “consonant” notes when they are played at the same time. This auditory confusion is known as “fusion” in the field. By contrast, two “dissonant” notes are easier to hear as separate.

The tendency of “consonant” notes to be heard by Westerners as fused could reflect their common occurrence in Western music. But it could also be driven by the resemblance of low-integer-ratio note combinations to the harmonic series. This similarity of consonant intervals to the acoustic structure of typical natural sounds raises the possibility that the human brain is biologically tuned to “fuse” consonant notes.

Graduate student and lead author, Malinda McPherson, works with a participant and translator in the field. Photo: Malinda McPherson

To explore this question, the team ran identical sets of experiments on two participant groups: US non-musicians residing in the Boston metropolitan area and Tsimane’ residing in villages in the Amazon rain forest. Listeners heard two concurrent notes separated by a particular musical interval (consonant or dissonant), and were asked to judge whether they heard one or two sounds. The experiment was performed with both synthetic and natural sounds.

They found that like the Boston cohort, the Tsimane’ were more likely to mistake two notes as a single sound if they were consonant than if they were dissonant.

“I was surprised by how similar some of the results in Tsimane’ participants were to those in US participants,” says McPherson, “particularly given the striking differences that we consistently see in preferences for musical intervals.”

When it came to whether consonant intervals were more pleasant than dissonant intervals, the results told a very different story. While the US study participants found consonant intervals more pleasant than dissonant intervals, the Tsimane’ showed no preference, implying that our sense of what is pleasant is shaped by culture.

“The fusion results provide an example of a perceptual effect that could influence musical systems, for instance by creating a natural perceptual contrast to exploit,” explains McDermott. “Hopefully our work helps to show how one can conduct rigorous perceptual experiments in the field and learn things that would be hidden if we didn’t consider populations in other parts of the world.”

Differences between deep neural networks and human perception

When your mother calls your name, you know it’s her voice — no matter the volume, even over a poor cell phone connection. And when you see her face, you know it’s hers — if she is far away, if the lighting is poor, or if you are on a bad FaceTime call. This robustness to variation is a hallmark of human perception. On the other hand, we are susceptible to illusions: We might fail to distinguish between sounds or images that are, in fact, different. Scientists have explained many of these illusions, but we lack a full understanding of the invariances in our auditory and visual systems.

Deep neural networks also have performed speech recognition and image classification tasks with impressive robustness to variations in the auditory or visual stimuli. But are the invariances learned by these models similar to the invariances learned by human perceptual systems? A group of MIT researchers has discovered that they are different. They presented their findings yesterday at the 2019 Conference on Neural Information Processing Systems.

The researchers made a novel generalization of a classical concept: “metamers” — physically distinct stimuli that generate the same perceptual effect. The most famous examples of metamer stimuli arise because most people have three different types of cones in their retinae, which are responsible for color vision. The perceived color of any single wavelength of light can be matched exactly by a particular combination of three lights of different colors — for example, red, green, and blue lights. Nineteenth-century scientists inferred from this observation that humans have three different types of bright-light detectors in our eyes. This is the basis for electronic color displays on all of the screens we stare at every day. Another example in the visual system is that when we fix our gaze on an object, we may perceive surrounding visual scenes that differ at the periphery as identical. In the auditory domain, something analogous can be observed. For example, the “textural” sound of two swarms of insects might be indistinguishable, despite differing in the acoustic details that compose them, because they have similar aggregate statistical properties. In each case, the metamers provide insight into the mechanisms of perception, and constrain models of the human visual or auditory systems.

In the current work, the researchers randomly chose natural images and sound clips of spoken words from standard databases, and then synthesized sounds and images so that deep neural networks would sort them into the same classes as their natural counterparts. That is, they generated physically distinct stimuli that are classified identically by models, rather than by humans. This is a new way to think about metamers, generalizing the concept to swap the role of computer models for human perceivers. They therefore called these synthesized stimuli “model metamers” of the paired natural stimuli. The researchers then tested whether humans could identify the words and images.

“Participants heard a short segment of speech and had to identify from a list of words which word was in the middle of the clip. For the natural audio this task is easy, but for many of the model metamers humans had a hard time recognizing the sound,” explains first-author Jenelle Feather, a graduate student in the MIT Department of Brain and Cognitive Sciences (BCS) and a member of the Center for Brains, Minds, and Machines (CBMM). That is, humans would not put the synthetic stimuli in the same class as the spoken word “bird” or the image of a bird. In fact, model metamers generated to match the responses of the deepest layers of the model were generally unrecognizable as words or images by human subjects.

Josh McDermott, associate professor in BCS and investigator in CBMM, makes the following case: “The basic logic is that if we have a good model of human perception, say of speech recognition, then if we pick two sounds that the model says are the same and present these two sounds to a human listener, that human should also say that the two sounds are the same. If the human listener instead perceives the stimuli to be different, this is a clear indication that the representations in our model do not match those of human perception.”

Joining Feather and McDermott on the paper are Alex Durango, a post-baccalaureate student, and Ray Gonzalez, a research assistant, both in BCS.

There is another type of failure of deep networks that has received a lot of attention in the media: adversarial examples (see, for example, “Why did my classifier just mistake a turtle for a rifle?“). These are stimuli that appear similar to humans but are misclassified by a model network (by design — they are constructed to be misclassified). They are complementary to the stimuli generated by Feather’s group, which sound or appear different to humans but are designed to be co-classified by the model network. The vulnerabilities of model networks exposed to adversarial attacks are well-known — face-recognition software might mistake identities; automated vehicles might not recognize pedestrians.

The importance of this work lies in improving models of perception beyond deep networks. Although the standard adversarial examples indicate differences between deep networks and human perceptual systems, the new stimuli generated by the McDermott group arguably represent a more fundamental model failure — they show that generic examples of stimuli classified as the same by a deep network produce wildly different percepts for humans.

The team also figured out ways to modify the model networks to yield metamers that were more plausible sounds and images to humans. As McDermott says, “This gives us hope that we may be able to eventually develop models that pass the metamer test and better capture human invariances.”

“Model metamers demonstrate a significant failure of present-day neural networks to match the invariances in the human visual and auditory systems,” says Feather, “We hope that this work will provide a useful behavioral measuring stick to improve model representations and create better models of human sensory systems.”