Josh McDermott seeks to replicate the human auditory system

The human auditory system is a marvel of biology. It can follow a conversation in a noisy restaurant, learn to recognize words from languages we’ve never heard before, and identify a familiar colleague by their footsteps as they walk by our office.

So far, even the most sophisticated computational models cannot perform such tasks as well as the human auditory system, but MIT neuroscientist Josh McDermott hopes to change that. Achieving this goal would be a major step toward developing new ways to help people with hearing loss, says McDermott, who recently earned tenure in MIT’s Department of Brain and Cognitive Sciences.

“Our long-term goal is to build good predictive models of the auditory system,” McDermott says.

“If we were successful in that goal, then it would really transform our ability to make people hear better, because we could design a computer program to figure out what to do to incoming sound to make it easier to recognize what somebody said or where a sound is coming from.”

McDermott’s lab also explores how exposure to different types of music affects people’s music preferences and even how they perceive music. Such studies can help to reveal elements of sound perception that are “hardwired” into our brains, and other elements that are influenced by exposure to different kinds of sounds.

“We have found that there is cross-cultural variation in things that people had widely supposed were universal and possibly even innate,” McDermott says.

Sound perception

As an undergraduate at Harvard University, McDermott originally planned to study math and physics, but “I was very quickly seduced by the brain,” he says. At the time, Harvard did not offer a major in neuroscience, so McDermott created his own, with a focus on vision.

After earning a master’s degree from University College London, he came to MIT to do a PhD in brain and cognitive sciences. His focus was still on vision, which he studied with Ted Adelson, the John and Dorothy Wilson Professor of Vision Science, but he found himself increasingly interested in audition. He had always loved music, and around this time, he started working as a radio and club DJ. “I was spending a lot of time thinking about sound and why things sound the way they do,” he recalls.

To pursue his new interest, he served as a postdoc at the University of Minnesota, where he worked in a lab devoted to psychoacoustics — the study of how humans perceive sound. There, he studied auditory phenomena such as the “cocktail party effect,” or the ability to focus on a particular person’s voice while tuning out background noise. During another postdoc at New York University, he started working on computational models of the auditory system. That interest in computation is part of what drew him back to MIT as a faculty member, in 2013.

“The culture here surrounding brain and cognitive science really prioritizes and values computation, and that was a perspective that was important to me,” says McDermott, who is also a member of MIT’s McGovern Institute for Brain Research and the Center for Brains, Minds and Machines. “I knew that was the kind of work I really wanted to do in my lab, so it just felt like a natural environment for doing that work.”

One aspect of audition that McDermott’s lab focuses on is “auditory scene analysis,” which includes tasks such as inferring what events in the environment caused a particular sound, and determining where a particular sound came from. This requires the ability to disentangle sounds produced by different events or objects, and the ability to tease out the effects of the environment. For instance, a basketball bouncing on a hardwood floor in a gym makes a different sound than a basketball bouncing on an outdoor paved court.

“Sounds in the world have very particular properties, due to physics and the way that the world works,” McDermott says. “We believe that the brain internalizes those regularities, and you have models in your head of the way that sound is generated. When you hear something, you are performing an inference in that model to figure out what is likely to have happened that caused the sound.”

A better understanding of how the brain does this may eventually lead to new strategies to enhance human hearing, McDermott says.

“Hearing impairment is the most common sensory disorder. It affects almost everybody as they get older, and the treatments are OK, but they’re not great,” he says. “We’re eventually going to all have personalized hearing aids that we walk around with, and we just need to develop the right algorithms in order to tell them what to do. That’s something we’re actively working on.”

Music in the brain

About 10 years ago, when McDermott was a postdoc, he started working on cross-cultural studies of how the human brain perceives music. Richard Godoy, an anthropologist at Brandeis University, asked McDermott to join him for some studies of the Tsimane’ people, who live in the Amazon rainforest. Since then, McDermott and some of his students have gone to Bolivia most summers to study sound perception among the Tsimane’. The Tsimane’ have had very little exposure to Western music, making them ideal subjects to study how listening to certain kinds of music influences human sound perception.

These studies have revealed both differences and similarities between Westerners and the Tsimane’ people. McDermott, who counts soul, disco, and jazz-funk among his favorite types of music, has found that Westerners and the Tsimane’ differ in their perceptions of dissonance. To Western ears, for example, the chord of C and F# sounds very unpleasant, but not to the Tsimane’.

He has also shown that that people in Western society perceive sounds that are separated by an octave to be similar, but the Tsimane’ do not. However, there are also some similarities between the two groups. For example, the upper limit of frequencies that can be perceived appears to be the same regardless of music exposure.

“We’re finding both striking variation in some perceptual traits that many people presumed were common across cultures and listeners, and striking similarities in others,” McDermott says. “The similarities and differences across cultures dissociate aspects of perception that are tightly coupled in Westerners, helping us to parcellate perceptual systems into their underlying components.”

Nine MIT students awarded 2021 Paul and Daisy Soros Fellowships for New Americans

An MIT senior and eight MIT graduate students are among the 30 recipients of this year’s P.D. Soros Fellowships for New Americans. In addition to senior Fiona Chen, MIT’s newest Soros winners include graduate students Aziza Almanakly, Alaleh Azhir, Brian Y. Chang PhD ’18, James Diao, Charlie ChangWon Lee, Archana Podury, Ashwin Sah ’20, and Enrique Toloza. Six of the recipients are enrolled at the Harvard-MIT Program in Health Sciences and Technology.

P.D. Soros Fellows receive up to $90,000 to fund their graduate studies and join a lifelong community of new Americans from different backgrounds and fields. The 2021 class was selected from a pool of 2,445 applicants, marking the most competitive year in the fellowship’s history.

The Paul & Daisy Soros Fellowships for New Americans program honors the contributions of immigrants and children of immigrants to the United States. As Fiona Chen says, “Being a new American has required consistent confrontation with the struggles that immigrants and racial minorities face in the U.S. today. It has meant frequent difficulties with finding security and comfort in new contexts. But it has also meant continual growth in learning to love the parts of myself — the way I look; the things that my family and I value — that have marked me as different, or as an outsider.”

Students interested in applying to the P.D. Soros fellowship should contact Kim Benard, assistant dean of distinguished fellowships in Career Advising and Professional Development.

Aziza Almanakly

Aziza Almanakly, a PhD student in electrical engineering and computer science, researches microwave quantum optics with superconducting qubits for quantum communication under Professor William Oliver in the Department of Physics. Almanakly’s career goal is to engineer multi-qubit systems that push boundaries in quantum technology.

Born and raised in northern New Jersey, Almanakly is the daughter of Syrian immigrants who came to the United States in the early 1990s in pursuit of academic opportunities. As the civil war in Syria grew dire, more of her relatives sought asylum in the U.S. Almanakly grew up around extended family who built a new version of their Syrian home in New Jersey.

Following in the footsteps of her mathematically minded father, Almanakly studied electrical engineering at The Cooper Union for the Advancement of Science and Art. She also pursued research opportunities in experimental quantum computing at Princeton University, the City University of New York, New York University, and Caltech.

Almanakly recognizes the importance of strong mentorship in diversifying engineering. She uses her unique experience as a New American and female engineer to encourage students from underrepresented backgrounds to enter STEM fields.

Alaleh Azhir

Alaleh Azhir grew up in Iran, where she pursued her passion for mathematics. She immigrated with her mother to the United States at age 14. Determined to overcome strict gender roles she had witnessed for women, Azhir is dedicated to improving health care for them.

Azhir graduated from Johns Hopkins University in 2019 with a perfect GPA as a triple major in biomedical engineering, computer science, and applied mathematics and statistics. A Rhodes and Barry Goldwater Scholar, she has developed many novel tools for visualization and analysis of genomics data at Johns Hopkins University, Harvard University, MIT, the National Institutes of Health, and laboratories in Switzerland.

After completing a master’s in statistical science at Oxford University, Azhir began her MD studies in the Harvard-MIT Program in Health Sciences and Technology. Her thesis focuses on the role of X and Y sex chromosomes on disease manifestations. Through medical training, she aims to build further computational tools specifically for preventive care for women. She has also founded and directs the nonprofit organization, Frappa, aimed at mentoring women living in Iran and helping them to immigrate abroad through the graduate school application process.

Brian Y. Chang PhD ’18

Born in Johnson City, New York, Brian Y. Chang PhD ’18 is the son of immigrants from the Shanghai municipality and Shandong Province in China. He pursued undergraduate and master’s degrees in mechanical engineering at Carnegie Mellon University, graduating in a combined four years with honors.

In 2018, Chang completed a PhD in medical engineering at MIT. Under the mentorship of Professor Elazer Edelman, Chang developed methods that make advanced cardiac technologies more accessible. The resulting approaches are used in hospitals around the world. Chang has published extensively and holds five patents.

With the goal of harnessing the power of engineering to improve patient care, Chang co-founded X-COR Therapeutics, a seed-funded medical device startup developing a more accessible treatment for lung failure with the potential to support patients with severe Covid-19 and chronic obstructive pulmonary disease.

After spending time in the hospital connecting with patients and teaching cardiovascular pathophysiology to medical students, Chang decided to attend medical school. He is currently a medical student in the Harvard-MIT Program in Health Sciences and Technology. Chang hopes to advance health care through medical device innovation and education as a future physician-scientist, entrepreneur, and educator.

Fiona Chen

MIT senior Fiona Chen was born in Cedar Park, Texas, the daughter of immigrants from China. Witnessing how her own and many other immigrant families faced significant difficulties finding work and financial stability sparked her interest in learning about poverty and economic inequality.

At MIT, Chen has pursued degrees in economics and mathematics. Her economics research projects have examined important policy issues — social isolation among students, global development and poverty, universal health-care systems, and the role of technology in shaping the labor market.

An active member of the MIT community, Chen has served as the officer on governance and officer on policy of the Undergraduate Association, MIT’s student government; the opinion editor of The Tech student newspaper; the undergraduate representative of several Institute-wide committees, including MIT’s Corporation Joint Advisory Committee; and one of the founding members of MIT Students Against War. In each of these roles, she has worked to advocate for policies to support underrepresented groups at MIT.

As a Soros fellow, Chen will pursue a PhD in economics to deepen her understanding of economic policy. Her ultimate goal is to become a professor who researches poverty and economic inequality, and applies her findings to craft policy solutions.

James Diao

James Diao graduated from Yale University with degrees in statistics and biochemistry and is currently a medical student at the Harvard-MIT Program in Health Sciences and Technology. He aspires to give voice to patient perspectives in the development and evaluation of health-care technology.

Diao grew up in Houston’s Chinatown, and spent summers with his extended family in Jiangxian. Diao’s family later moved to Fort Bend, Texas, where he found a pediatric oncologist mentor who introduced him to the wonders of modern molecular biology.

Diao’s interests include the responsible development of technology. At Apple, he led projects to validate wearable health features in diverse populations; at PathAI, he built deep learning models to broaden access to pathologist services; at Yale, where he worked on standardizing analyses of exRNA biomarkers; and at Harvard, he studied the impacts of clinical guidelines on marginalized groups.

Diao’s lead author research in the New England Journal of Medicine and JAMA systematically compared race-based and race-free equations for kidney function, and demonstrated that up to 1 million Black Americans may receive unequal kidney care due to their race. He has also published articles on machine learning and precision medicine.

Charlie ChangWon Lee

Born in Seoul, South Korea, Charlie ChangWon Lee was 10 when his family immigrated to the United States and settled in Palisades Park, New Jersey. The stress of his parents’ lack of health coverage ignited Lee’s determination to study the reasons for the high cost of health care in the U.S. and learn how to care for uninsured families like his own.

Lee graduated summa cum laude in integrative biology from Harvard College, winning the Hoopes Prize for his thesis on the therapeutic potential of human gut microbes. Lee’s research on novel therapies led him to question how newly approved, and expensive, medications could reach more patients.

At the Program on Regulation, Therapeutics, and Law (PORTAL) at Brigham and Women’s Hospital, Lee studied policy issues involving pharmaceutical drug pricing, drug development, and medication use and safety. His articles have appeared in JAMA, Health Affairs, and Mayo Clinic Proceedings.

As a first-year medical student at the Harvard-MIT Health Sciences and Technology program, Lee is investigating policies to incentivize vaccine and biosimilar drug development. He hopes to find avenues to bridge science and policy and translate medical innovations into accessible, affordable therapies.

Archana Podury

The daughter of Indian immigrants, Archana Podury was born in Mountain View, California. As an undergraduate at Cornell University, she studied the neural circuits underlying motor learning. Her growing interest in whole-brain dynamics led her to the Princeton Neuroscience Institute and Neuralink, where she discovered how brain-machine interfaces could be used to understand diffuse networks in the brain.

While studying neural circuits, Podury worked at a syringe exchange in Ithaca, New York, where she witnessed firsthand the mechanics of court-based drug rehabilitation. Now, as an MD student in the Harvard-MIT Health Sciences and Technology program, Podury is interested in combining computational and social approaches to neuropsychiatric disease.

In the Boyden Lab at the MIT McGovern Institute for Brain Research, Podury is developing human brain organoid models to better characterize circuit dysfunction in neurodevelopmental disorders. Concurrently, her work in the Dhand Lab at Brigham and Women’s Hospital applies network science tools to understand how patients’ social environments influence their health outcomes following acute neurological injury.

Podury hopes that focusing on both neural and social networks can lead toward a more comprehensive, and compassionate, approach to health and disease.

Ashwin Sah ’20

Ashwin Sah ’20 was born and raised in Portland, Oregon, the son of Indian immigrants. He developed a passion for mathematics research as an undergraduate at MIT, where he conducted research under Professor Yufei Zhao, as well as at the Duluth and Emory REU (Research Experience for Undergraduates) programs.

Sah has given talks on his work at multiple professional venues. His undergraduate research in varied areas of combinatorics and discrete mathematics culminated in the Barry Goldwater Scholarship and the Frank and Brennie Morgan Prize for Outstanding Research in Mathematics by an Undergraduate Student. Additionally, his work on diagonal Ramsey numbers was recently featured in Quanta Magazine.

Beyond research, Sah has pursued opportunities to give back to the math community, helping to organize or grade competitions such as the Harvard-MIT Mathematics Tournament and the USA Mathematical Olympiad. He has also been a grader at the Mathematical Olympiad Program, a camp for talented high-school students in the United States, and an instructor for the Monsoon Math Camp, a virtual program aimed at teaching higher mathematics to high school students in India.

Sah is currently a PhD student in mathematics at MIT, where he continues to work with Zhao.

Enrique Toloza

Enrique Toloza was born in Los Angeles, California, the child of two immigrants: one from Colombia who came to the United States for a PhD and the other from the Philippines who grew up in California and went on to medical school. Their literal marriage of science and medicine inspired Toloza to become a physician-scientist.

Toloza majored in physics and Spanish literature at the University of North Carolina at Chapel Hill. He eventually settled on an interest in theoretical neuroscience after a summer research internship at MIT and completing an honors thesis on noninvasive brain stimulation.

After college, Toloza joined Professor Mark Harnett’s laboratory at MIT for a year. He went on to enroll in the Harvard-MIT MD/PhD program, studying within the Health Sciences and Technology MD curriculum at Harvard and the PhD program at MIT. For his PhD, Toloza rejoined Harnett to conduct research on the biophysics of dendritic integration and the contribution of dendrites to cortical computations in the brain.

Toloza is passionate about expanding health care access to immigrant populations. In college, he led the interpreting team at the University of North Carolina at Chapel Hill’s student-run health clinic; at Harvard Medical School, he has worked with Spanish-speaking patients as a student clinician.

James DiCarlo named director of the MIT Quest for Intelligence

James DiCarlo, the Peter de Florez Professor of Neuroscience, has been appointed to the role of director of the MIT Quest for Intelligence. MIT Quest was launched in 2018 to discover the basis of natural intelligence, create new foundations for machine intelligence, and deliver new tools and technologies for humanity.

As director, DiCarlo will forge new collaborations with researchers within MIT and beyond to accelerate progress in understanding intelligence and developing the next generation of intelligence tools.

“We have discovered and developed surprising new connections between natural and artificial intelligence,” says DiCarlo, currently head of the Department of Brain and Cognitive Sciences (BCS). “The scientific understanding of natural intelligence, and advances in building artificial intelligence with positive real-world impact, are interlocked aspects of a unified, collaborative grand challenge, and MIT must continue to lead the way.”

Aude Oliva, senior research scientist at the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the MIT director of the MIT-IBM Watson AI Lab, will lead industry engagements as director of MIT Quest Corporate. Nicholas Roy, professor of aeronautics and astronautics and a member of CSAIL, will lead the development of systems to deliver on the mission as director of MIT Quest Systems Engineering. Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing, will serve as chair of MIT Quest.

“The MIT Quest’s leadership team has positioned this initiative to spearhead our understanding of natural and artificial intelligence, and I am delighted that Jim is taking on this role,” says Huttenlocher, the Henry Ellis Warren (1894) Professor of Electrical Engineering and Computer Science.

DiCarlo will step down from his current role as head of BCS, a position he has held for nearly nine years, and will continue as faculty in BCS and as an investigator in the McGovern Institute for Brain Research.

“Jim has been a highly productive leader for his department, the School of Science, and the Institute at large. I’m excited to see the impact he will make in this new role,” says Nergis Mavalvala, dean of the School of Science and the Curtis and Kathleen Marble Professor of Astrophysics.

As department head, DiCarlo oversaw significant progress in the department’s scientific and educational endeavors. Roughly a quarter of current BCS faculty were hired on his watch, strengthening the department’s foundations in cognitive, systems, and cellular and molecular brain science. In addition, DiCarlo developed a new departmental emphasis in computation, deepening BCS’s ties with the MIT Schwarzman College of Computing and other MIT units such as the Center for Brains, Minds and Machines. He also developed and leads an NIH-funded graduate training program in computationally-enabled integrative neuroscience. As a result, BCS is one of the few departments in the world that is attempting to decipher, in engineering terms, how the human mind emerges from the biological components of the brain.

To prepare students for this future, DiCarlo collaborated with BCS Associate Department Head Michale Fee to design and execute a total overhaul of the Course 9 curriculum. In addition, partnering with the Department of Electrical Engineering and Computer Science, BCS developed a new major, Course 6-9 (Computation and Cognition), to fill the rapidly growing interest in this interdisciplinary topic. In only its second year, Course 6-9 already has more than 100 undergraduate majors.

DiCarlo has also worked tirelessly to build a more open, connected, and supportive culture across the entire BCS community in Building 46. In this work, as in everything, DiCarlo sought to bring people together to address challenges collaboratively. He attributes progress to strong partnerships with Li-Huei Tsai, the Picower Professor of Neuroscience in BCS and director of the Picower Institute for Learning and Memory; Robert Desimone, the Doris and Don Berkey Professor in BCS and director of the McGovern Institute for Brain Research; and to the work of dozens of faculty and staff. For example, in collaboration with associate department head Professor Rebecca Saxe, the department has focused on faculty mentorship of graduate students, and, in collaboration with postdoc officer Professor Mark Bear, the department developed postdoc salary and benefit standards. Both initiatives have become models for the Institute. In recent months, DiCarlo partnered with new associate department head Professor Laura Schulz to constructively focus renewed energy and resources on initiatives to address systemic racism and promote diversity, equity, inclusion, and social justice.

“Looking ahead, I share Jim’s vision for the research and educational programs of the department, and for enhancing its cohesiveness as a community, especially with regard to issues of diversity, equity, inclusion, and justice,” says Mavalvala. “I am deeply committed to supporting his successor in furthering these goals while maintaining the great intellectual strength of BCS.”

In his own research, DiCarlo uses a combination of large-scale neurophysiology, brain imaging, optogenetic methods, and high-throughput computational simulations to understand the neuronal mechanisms and cortical computations that underlie human visual intelligence. Working in animal models, he and his research collaborators have established precise connections between the internal workings of the visual system and the internal workings of particular computer vision systems. And they have demonstrated that these science-to-engineering connections lead to new ways to modulate neurons deep in the brain as well as to improved machine vision systems. His lab’s goals are to help develop more human-like machine vision, new neural prosthetics to restore or augment lost senses, new learning strategies, and an understanding of how visual cognition is impaired in agnosia, autism, and dyslexia.

DiCarlo earned both a PhD in biomedical engineering and an MD from The Johns Hopkins University in 1998, and completed his postdoc training in primate visual neurophysiology at Baylor College of Medicine. He joined the MIT faculty in 2002.

A search committee will convene early this year to recommend candidates for the next department head of BCS. DiCarlo will continue to lead the department until that new head is selected.

To the brain, reading computer code is not the same as reading language

In some ways, learning to program a computer is similar to learning a new language. It requires learning new symbols and terms, which must be organized correctly to instruct the computer what to do. The computer code must also be clear enough that other programmers can read and understand it.

In spite of those similarities, MIT neuroscientists have found that reading computer code does not activate the regions of the brain that are involved in language processing. Instead, it activates a distributed network called the multiple demand network, which is also recruited for complex cognitive tasks such as solving math problems or crossword puzzles.

However, although reading computer code activates the multiple demand network, it appears to rely more on different parts of the network than math or logic problems do, suggesting that coding does not precisely replicate the cognitive demands of mathematics either.

“Understanding computer code seems to be its own thing. It’s not the same as language, and it’s not the same as math and logic,” says Anna Ivanova, an MIT graduate student and the lead author of the study.

Evelina Fedorenko, the Frederick A. and Carole J. Middleton Career Development Associate Professor of Neuroscience and a member of the McGovern Institute for Brain Research, is the senior author of the paper, which appears today in eLife. Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory and Tufts University were also involved in the study.

Language and cognition

McGovern Investivator Ev Fedorenko in the Martinos Imaging Center at MIT. Photo: Caitlin Cunningham

A major focus of Fedorenko’s research is the relationship between language and other cognitive functions. In particular, she has been studying the question of whether other functions rely on the brain’s language network, which includes Broca’s area and other regions in the left hemisphere of the brain. In previous work, her lab has shown that music and math do not appear to activate this language network.

“Here, we were interested in exploring the relationship between language and computer programming, partially because computer programming is such a new invention that we know that there couldn’t be any hardwired mechanisms that make us good programmers,” Ivanova says.

There are two schools of thought regarding how the brain learns to code, she says. One holds that in order to be good at programming, you must be good at math. The other suggests that because of the parallels between coding and language, language skills might be more relevant. To shed light on this issue, the researchers set out to study whether brain activity patterns while reading computer code would overlap with language-related brain activity.

The two programming languages that the researchers focused on in this study are known for their readability — Python and ScratchJr, a visual programming language designed for children age 5 and older. The subjects in the study were all young adults proficient in the language they were being tested on. While the programmers lay in a functional magnetic resonance (fMRI) scanner, the researchers showed them snippets of code and asked them to predict what action the code would produce.

The researchers saw little to no response to code in the language regions of the brain. Instead, they found that the coding task mainly activated the so-called multiple demand network. This network, whose activity is spread throughout the frontal and parietal lobes of the brain, is typically recruited for tasks that require holding many pieces of information in mind at once, and is responsible for our ability to perform a wide variety of mental tasks.

“It does pretty much anything that’s cognitively challenging, that makes you think hard,” says Ivanova, who was also named one of the McGovern Institute’s rising stars in neuroscience.

Previous studies have shown that math and logic problems seem to rely mainly on the multiple demand regions in the left hemisphere, while tasks that involve spatial navigation activate the right hemisphere more than the left. The MIT team found that reading computer code appears to activate both the left and right sides of the multiple demand network, and ScratchJr activated the right side slightly more than the left. This finding goes against the hypothesis that math and coding rely on the same brain mechanisms.

Effects of experience

The researchers say that while they didn’t identify any regions that appear to be exclusively devoted to programming, such specialized brain activity might develop in people who have much more coding experience.

“It’s possible that if you take people who are professional programmers, who have spent 30 or 40 years coding in a particular language, you may start seeing some specialization, or some crystallization of parts of the multiple demand system,” Fedorenko says. “In people who are familiar with coding and can efficiently do these tasks, but have had relatively limited experience, it just doesn’t seem like you see any specialization yet.”

In a companion paper appearing in the same issue of eLife, a team of researchers from Johns Hopkins University also reported that solving code problems activates the multiple demand network rather than the language regions.

The findings suggest there isn’t a definitive answer to whether coding should be taught as a math-based skill or a language-based skill. In part, that’s because learning to program may draw on both language and multiple demand systems, even if — once learned — programming doesn’t rely on the language regions, the researchers say.

“There have been claims from both camps — it has to be together with math, it has to be together with language,” Ivanova says. “But it looks like computer science educators will have to develop their own approaches for teaching code most effectively.”

The research was funded by the National Science Foundation, the Department of the Brain and Cognitive Sciences at MIT, and the McGovern Institute for Brain Research.

Neuroscientists find a way to improve object-recognition models

Computer vision models known as convolutional neural networks can be trained to recognize objects nearly as accurately as humans do. However, these models have one significant flaw: Very small changes to an image, which would be nearly imperceptible to a human viewer, can trick them into making egregious errors such as classifying a cat as a tree.

A team of neuroscientists from MIT, Harvard University, and IBM have developed a way to alleviate this vulnerability, by adding to these models a new layer that is designed to mimic the earliest stage of the brain’s visual processing system. In a new study, they showed that this layer greatly improved the models’ robustness against this type of mistake.

A grid showing the visualization of many common image corruption types. First row, original image, followed by the noise corruptions; second row, blur corruptions; third row, weather corruptions; fourth row, digital corruptions.
Credits: Courtesy of the researchers.

“Just by making the models more similar to the brain’s primary visual cortex, in this single stage of processing, we see quite significant improvements in robustness across many different types of perturbations and corruptions,” says Tiago Marques, an MIT postdoc and one of the lead authors of the study.

Convolutional neural networks are often used in artificial intelligence applications such as self-driving cars, automated assembly lines, and medical diagnostics. Harvard graduate student Joel Dapello, who is also a lead author of the study, adds that “implementing our new approach could potentially make these systems less prone to error and more aligned with human vision.”

“Good scientific hypotheses of how the brain’s visual system works should, by definition, match the brain in both its internal neural patterns and its remarkable robustness. This study shows that achieving those scientific gains directly leads to engineering and application gains,” says James DiCarlo, the head of MIT’s Department of Brain and Cognitive Sciences, an investigator in the Center for Brains, Minds, and Machines and the McGovern Institute for Brain Research, and the senior author of the study.

The study, which is being presented at the NeurIPS conference this month, is also co-authored by MIT graduate student Martin Schrimpf, MIT visiting student Franziska Geiger, and MIT-IBM Watson AI Lab Director David Cox.

Mimicking the brain

Recognizing objects is one of the visual system’s primary functions. In just a small fraction of a second, visual information flows through the ventral visual stream to the brain’s inferior temporal cortex, where neurons contain information needed to classify objects. At each stage in the ventral stream, the brain performs different types of processing. The very first stage in the ventral stream, V1, is one of the most well-characterized parts of the brain and contains neurons that respond to simple visual features such as edges.

“It’s thought that V1 detects local edges or contours of objects, and textures, and does some type of segmentation of the images at a very small scale. Then that information is later used to identify the shape and texture of objects downstream,” Marques says. “The visual system is built in this hierarchical way, where in early stages neurons respond to local features such as small, elongated edges.”

For many years, researchers have been trying to build computer models that can identify objects as well as the human visual system. Today’s leading computer vision systems are already loosely guided by our current knowledge of the brain’s visual processing. However, neuroscientists still don’t know enough about how the entire ventral visual stream is connected to build a model that precisely mimics it, so they borrow techniques from the field of machine learning to train convolutional neural networks on a specific set of tasks. Using this process, a model can learn to identify objects after being trained on millions of images.

Many of these convolutional networks perform very well, but in most cases, researchers don’t know exactly how the network is solving the object-recognition task. In 2013, researchers from DiCarlo’s lab showed that some of these neural networks could not only accurately identify objects, but they could also predict how neurons in the primate brain would respond to the same objects much better than existing alternative models. However, these neural networks are still not able to perfectly predict responses along the ventral visual stream, particularly at the earliest stages of object recognition, such as V1.

These models are also vulnerable to so-called “adversarial attacks.” This means that small changes to an image, such as changing the colors of a few pixels, can lead the model to completely confuse an object for something different — a type of mistake that a human viewer would not make.

A comparison of adversarial images with different perturbation strengths.
Credits: Courtesy of the researchers.

As a first step in their study, the researchers analyzed the performance of 30 of these models and found that models whose internal responses better matched the brain’s V1 responses were also less vulnerable to adversarial attacks. That is, having a more brain-like V1 seemed to make the model more robust. To further test and take advantage of that idea, the researchers decided to create their own model of V1, based on existing neuroscientific models, and place it at the front of convolutional neural networks that had already been developed to perform object recognition.

When the researchers added their V1 layer, which is also implemented as a convolutional neural network, to three of these models, they found that these models became about four times more resistant to making mistakes on images perturbed by adversarial attacks. The models were also less vulnerable to misidentifying objects that were blurred or distorted due to other corruptions.

“Adversarial attacks are a big, open problem for the practical deployment of deep neural networks. The fact that adding neuroscience-inspired elements can improve robustness substantially suggests that there is still a lot that AI can learn from neuroscience, and vice versa,” Cox says.

Better defense

Currently, the best defense against adversarial attacks is a computationally expensive process of training models to recognize the altered images. One advantage of the new V1-based model is that it doesn’t require any additional training. It is also better able to handle a wide range of distortions, beyond adversarial attacks.

The researchers are now trying to identify the key features of their V1 model that allows it to do a better job resisting adversarial attacks, which could help them to make future models even more robust. It could also help them learn more about how the human brain is able to recognize objects.

“One big advantage of the model is that we can map components of the model to particular neuronal populations in the brain,” Dapello says. “We can use this as a tool for novel neuroscientific discoveries, and also continue developing this model to improve its performance under this challenging task.”

The research was funded by the PhRMA Foundation Postdoctoral Fellowship in Informatics, the Semiconductor Research Corporation, DARPA, the MIT Shoemaker Fellowship, the U.S. Office of Naval Research, the Simons Foundation, and the MIT-IBM Watson AI Lab.

How humans use objects in novel ways to solve problems

Human beings are naturally creative tool users. When we need to drive in a nail but don’t have a hammer, we easily realize that we can use a heavy, flat object like a rock in its place. When our table is shaky, we quickly find that we can put a stack of paper under the table leg to stabilize it. But while these actions seem so natural to us, they are believed to be a hallmark of great intelligence — only a few other species use objects in novel ways to solve their problems, and none can do so as flexibly as people. What provides us with these powerful capabilities for using objects in this way?

In a new paper published in the Proceedings of the National Academy of Sciences describing work conducted at MIT’s Center for Brains, Minds and Machines, researchers Kelsey Allen, Kevin Smith, and Joshua Tenenbaum study the cognitive components that underlie this sort of improvised tool use. They designed a novel task, the Virtual Tools game, that taps into tool-use abilities: People must select one object from a set of “tools” that they can place in a two-dimensional, computerized scene to accomplish a goal, such as getting a ball into a certain container. Solving the puzzles in this game requires reasoning about a number of physical principles, including launching, blocking, or supporting objects.

The team hypothesized that there are three capabilities that people rely on to solve these puzzles: a prior belief that guides people’s actions toward those that will make a difference in the scene, the ability to imagine the effect of their actions, and a mechanism to quickly update their beliefs about what actions are likely to provide a solution. They built a model that instantiated these principles, called the “Sample, Simulate, Update,” or “SSUP,” model, and had it play the same game as people. They found that SSUP solved each puzzle at similar rates and in similar ways as people did. On the other hand, a popular deep learning model that could play Atari games well but did not have the same object and physical structures was unable to generalize its knowledge to puzzles it was not directly trained on.

This research provides a new framework for studying and formalizing the cognition that supports human tool use. The team hopes to extend this framework to not just study tool use, but also how people can create innovative new tools for new problems, and how humans transmit this information to build from simple physical tools to complex objects like computers or airplanes that are now part of our daily lives.

Kelsey Allen, a PhD student in the Computational Cognitive Science Lab at MIT, is excited about how the Virtual Tools game might support other cognitive scientists interested in tool use: “There is just so much more to explore in this domain. We have already started collaborating with researchers across multiple different institutions on projects ranging from studying what it means for games to be fun, to studying how embodiment affects disembodied physical reasoning. I hope that others in the cognitive science community will use the game as a tool to better understand how physical models interact with decision-making and planning.”

Joshua Tenenbaum, professor of computational cognitive science at MIT, sees this work as a step toward understanding not only an important aspect of human cognition and culture, but also how to build more human-like forms of intelligence in machines. “Artificial Intelligence researchers have been very excited about the potential for reinforcement learning (RL) algorithms to learn from trial-and-error experience, as humans do, but the real trial-and-error learning that humans benefit from unfolds over just a handful of trials — not millions or billions of experiences, as in today’s RL systems,” Tenenbaum says. “The Virtual Tools game allows us to study this very rapid and much more natural form of trial-and-error learning in humans, and the fact that the SSUP model is able to capture the fast learning dynamics we see in humans suggests it may also point the way towards new AI approaches to RL that can learn from their successes, their failures, and their near misses as quickly and as flexibly as people do.”

Using machine learning to track the pandemic’s impact on mental health

Dealing with a global pandemic has taken a toll on the mental health of millions of people. A team of MIT and Harvard University researchers has shown that they can measure those effects by analyzing the language that people use to express their anxiety online.

Using machine learning to analyze the text of more than 800,000 Reddit posts, the researchers were able to identify changes in the tone and content of language that people used as the first wave of the Covid-19 pandemic progressed, from January to April of 2020. Their analysis revealed several key changes in conversations about mental health, including an overall increase in discussion about anxiety and suicide.

“We found that there were these natural clusters that emerged related to suicidality and loneliness, and the amount of posts in these clusters more than doubled during the pandemic as compared to the same months of the preceding year, which is a grave concern,” says Daniel Low, a graduate student in the Program in Speech and Hearing Bioscience and Technology at Harvard and MIT and the lead author of the study.

The analysis also revealed varying impacts on people who already suffer from different types of mental illness. The findings could help psychiatrists, or potentially moderators of the Reddit forums that were studied, to better identify and help people whose mental health is suffering, the researchers say.

“When the mental health needs of so many in our society are inadequately met, even at baseline, we wanted to bring attention to the ways that many people are suffering during this time, in order to amplify and inform the allocation of resources to support them,” says Laurie Rumker, a graduate student in the Bioinformatics and Integrative Genomics PhD Program at Harvard and one of the authors of the study.

Satrajit Ghosh, a principal research scientist at MIT’s McGovern Institute for Brain Research, is the senior author of the study, which appears in the Journal of Internet Medical Research. Other authors of the paper include Tanya Talkar, a graduate student in the Program in Speech and Hearing Bioscience and Technology at Harvard and MIT; John Torous, director of the digital psychiatry division at Beth Israel Deaconess Medical Center; and Guillermo Cecchi, a principal research staff member at the IBM Thomas J. Watson Research Center.

A wave of anxiety

The new study grew out of the MIT class 6.897/HST.956 (Machine Learning for Healthcare), in MIT’s Department of Electrical Engineering and Computer Science. Low, Rumker, and Talkar, who were all taking the course last spring, had done some previous research on using machine learning to detect mental health disorders based on how people speak and what they say. After the Covid-19 pandemic began, they decided to focus their class project on analyzing Reddit forums devoted to different types of mental illness.

“When Covid hit, we were all curious whether it was affecting certain communities more than others,” Low says. “Reddit gives us the opportunity to look at all these subreddits that are specialized support groups. It’s a really unique opportunity to see how these different communities were affected differently as the wave was happening, in real-time.”

The researchers analyzed posts from 15 subreddit groups devoted to a variety of mental illnesses, including schizophrenia, depression, and bipolar disorder. They also included a handful of groups devoted to topics not specifically related to mental health, such as personal finance, fitness, and parenting.

Using several types of natural language processing algorithms, the researchers measured the frequency of words associated with topics such as anxiety, death, isolation, and substance abuse, and grouped posts together based on similarities in the language used. These approaches allowed the researchers to identify similarities between each group’s posts after the onset of the pandemic, as well as distinctive differences between groups.

The researchers found that while people in most of the support groups began posting about Covid-19 in March, the group devoted to health anxiety started much earlier, in January. However, as the pandemic progressed, the other mental health groups began to closely resemble the health anxiety group, in terms of the language that was most often used. At the same time, the group devoted to personal finance showed the most negative semantic change from January to April 2020, and significantly increased the use of words related to economic stress and negative sentiment.

They also discovered that the mental health groups affected the most negatively early in the pandemic were those related to ADHD and eating disorders. The researchers hypothesize that without their usual social support systems in place, due to lockdowns, people suffering from those disorders found it much more difficult to manage their conditions. In those groups, the researchers found posts about hyperfocusing on the news and relapsing back into anorexia-type behaviors since meals were not being monitored by others due to quarantine.

Using another algorithm, the researchers grouped posts into clusters such as loneliness or substance use, and then tracked how those groups changed as the pandemic progressed. Posts related to suicide more than doubled from pre-pandemic levels, and the groups that became significantly associated with the suicidality cluster during the pandemic were the support groups for borderline personality disorder and post-traumatic stress disorder.

The researchers also found the introduction of new topics specifically seeking mental health help or social interaction. “The topics within these subreddit support groups were shifting a bit, as people were trying to adapt to a new life and focus on how they can go about getting more help if needed,” Talkar says.

While the authors emphasize that they cannot implicate the pandemic as the sole cause of the observed linguistic changes, they note that there was much more significant change during the period from January to April in 2020 than in the same months in 2019 and 2018, indicating the changes cannot be explained by normal annual trends.

Mental health resources

This type of analysis could help mental health care providers identify segments of the population that are most vulnerable to declines in mental health caused by not only the Covid-19 pandemic but other mental health stressors such as controversial elections or natural disasters, the researchers say.

Additionally, if applied to Reddit or other social media posts in real-time, this analysis could be used to offer users additional resources, such as guidance to a different support group, information on how to find mental health treatment, or the number for a suicide hotline.

“Reddit is a very valuable source of support for a lot of people who are suffering from mental health challenges, many of whom may not have formal access to other kinds of mental health support, so there are implications of this work for ways that support within Reddit could be provided,” Rumker says.

The researchers now plan to apply this approach to study whether posts on Reddit and other social media sites can be used to detect mental health disorders. One current project involves screening posts in a social media site for veterans for suicide risk and post-traumatic stress disorder.

The research was funded by the National Institutes of Health and the McGovern Institute.

Researchers ID crucial brain pathway involved in object recognition

MIT researchers have identified a brain pathway critical in enabling primates to effortlessly identify objects in their field of vision. The findings enrich existing models of the neural circuitry involved in visual perception and help to further unravel the computational code for solving object recognition in the primate brain.

Led by Kohitij Kar, a postdoctoral associate at the McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, the study looked at an area called the ventrolateral prefrontal cortex (vlPFC), which sends feedback signals to the inferior temporal (IT) cortex via a network of neurons. The main goal of this study was to test how the back and forth information processing of this circuitry, that is, this recurrent neural network, is essential to rapid object identification in primates.

The current study, published in Neuron and available today via open access, is a follow-up to prior work published by Kar and James DiCarlo, Peter de Florez Professor of Neuroscience, the head of MIT’s Department of Brain and Cognitive Sciences, and an investigator in the McGovern Institute for Brain Research and the Center for Brains, Minds, and Machines.

Monkey versus machine

In 2019, Kar, DiCarlo, and colleagues identified that primates must use some recurrent circuits during rapid object recognition. Monkey subjects in that study were able to identify objects more accurately than engineered “feedforward” computational models, called deep convolutional neural networks, that lacked recurrent circuitry.

Interestingly, specific images for which models performed poorly compared to monkeys in object identification, also took longer to be solved in the monkeys’ brains — suggesting that the additional time might be due to recurrent processing in the brain. Based on the 2019 study, it remained unclear though exactly which recurrent circuits were responsible for the delayed information boost in the IT cortex. That’s where the current study picks up.

“In this new study, we wanted to find out: Where are these recurrent signals in IT coming from?” Kar said. “Which areas reciprocally connected to IT, are functionally the most critical part of this recurrent circuit?”

To determine this, researchers used a pharmacological agent to temporarily block the activity in parts of the vlPFC in macaques while they engaged in an object discrimination task. During these tasks, monkeys viewed images that contained an object, such as an apple, a car, or a dog; then, researchers used eye tracking to determine if the monkeys could correctly indicate what object they had previously viewed when given two object choices.

“We observed that if you use pharmacological agents to partially inactivate the vlPFC, then both the monkeys’ behavior and IT cortex activity deteriorates but more so for certain specific images. These images were the same ones we identified in the previous study — ones that were poorly solved by ‘feedforward’ models and took longer to be solved in the monkey’s IT cortex,” said Kar.

MIT researchers used an object recognition task (e.g., recognizing that there is a “bird” and not an “elephant” in the shown image) in studying the role of feedback from primate ventrolateral prefrontal cortex (vlPFC) to the inferior temporal (IT) cortex via a network of neurons. In primate brains, temporally blocking the vlPFC (green shaded area) disrupts the recurrent neural network comprising vlPFC and IT inducing specific deficits, implicating its role in rapid object identification. Image: Kohitij Kar, brain image adapted from SciDraw

“These results provide evidence that this recurrently connected network is critical for rapid object recognition, the behavior we’re studying. Now, we have a better understanding of how the full circuit is laid out, and what are the key underlying neural components of this behavior.”

The full study, entitled “Fast recurrent processing via ventrolateral prefrontal cortex is needed by the primate ventral stream for robust core visual object recognition,” will run in print January 6, 2021.

“This study demonstrates the importance of pre-frontal cortical circuits in automatically boosting object recognition performance in a very particular way,” DiCarlo said. “These results were obtained in nonhuman primates and thus are highly likely to also be relevant to human vision.”

The present study makes clear the integral role of the recurrent connections between the vlPFC and the primate ventral visual cortex during rapid object recognition. The results will be helpful to researchers designing future studies that aim to develop accurate models of the brain, and to researchers who seek to develop more human-like artificial intelligence.

National Science Foundation announces MIT-led Institute for Artificial Intelligence and Fundamental Interactions

The U.S. National Science Foundation (NSF) announced today an investment of more than $100 million to establish five artificial intelligence (AI) institutes, each receiving roughly $20 million over five years. One of these, the NSF AI Institute for Artificial Intelligence and Fundamental Interactions (IAIFI), will be led by MIT’s Laboratory for Nuclear Science (LNS) and become the intellectual home of more than 25 physics and AI senior researchers at MIT and Harvard, Northeastern, and Tufts universities.

By merging research in physics and AI, the IAIFI seeks to tackle some of the most challenging problems in physics, including precision calculations of the structure of matter, gravitational-wave detection of merging black holes, and the extraction of new physical laws from noisy data.

“The goal of the IAIFI is to develop the next generation of AI technologies, based on the transformative idea that artificial intelligence can directly incorporate physics intelligence,” says Jesse Thaler, an associate professor of physics at MIT, LNS researcher, and IAIFI director.  “By fusing the ‘deep learning’ revolution with the time-tested strategies of ‘deep thinking’ in physics, we aim to gain a deeper understanding of our universe and of the principles underlying intelligence.”

IAIFI researchers say their approach will enable making groundbreaking physics discoveries, and advance AI more generally, through the development of novel AI approaches that incorporate first principles from fundamental physics.

“Invoking the simple principle of translational symmetry — which in nature gives rise to conservation of momentum — led to dramatic improvements in image recognition,” says Mike Williams, an associate professor of physics at MIT, LNS researcher, and IAIFI deputy director. “We believe incorporating more complex physics principles will revolutionize how AI is used to study fundamental interactions, while simultaneously advancing the foundations of AI.”

In addition, a core element of the IAIFI mission is to transfer their technologies to the broader AI community.

“Recognizing the critical role of AI, NSF is investing in collaborative research and education hubs, such as the NSF IAIFI anchored at MIT, which will bring together academia, industry, and government to unearth profound discoveries and develop new capabilities,” says NSF Director Sethuraman Panchanathan. “Just as prior NSF investments enabled the breakthroughs that have given rise to today’s AI revolution, the awards being announced today will drive discovery and innovation that will sustain American leadership and competitiveness in AI for decades to come.”

Research in AI and fundamental interactions

Fundamental interactions are described by two pillars of modern physics: at short distances by the Standard Model of particle physics, and at long distances by the Lambda Cold Dark Matter model of Big Bang cosmology. Both models are based on physical first principles such as causality and space-time symmetries.  An abundance of experimental evidence supports these theories, but also exposes where they are incomplete, most pressingly that the Standard Model does not explain the nature of dark matter, which plays an essential role in cosmology.

AI has the potential to help answer these questions and others in physics.

For many physics problems, the governing equations that encode the fundamental physical laws are known. However, undertaking key calculations within these frameworks, as is essential to test our understanding of the universe and guide physics discovery, can be computationally demanding or even intractable. IAIFI researchers are developing AI for such first-principles theory studies, which naturally require AI approaches that rigorously encode physics knowledge.

“My group is developing new provably exact algorithms for theoretical nuclear physics,” says Phiala Shanahan, an assistant professor of physics and LNS researcher at MIT. “Our first-principles approach turns out to have applications in other areas of science and even in robotics, leading to exciting collaborations with industry partners.”

Incorporating physics principles into AI could also have a major impact on many experimental applications, such as designing AI methods that are more easily verifiable. IAIFI researchers are working to enhance the scientific potential of various facilities, including the Large Hadron Collider (LHC) and the Laser Interferometer Gravity Wave Observatory (LIGO).

“Gravitational-wave detectors are among the most sensitive instruments on Earth, but the computational systems used to operate them are mostly based on technology from the previous century,” says Principal Research Scientist Lisa Barsotti of the MIT Kavli Institute for Astrophysics and Space Research. “We have only begun to scratch the surface of what can be done with AI; just enough to see that the IAIFI will be a game-changer.”

The unique features of these physics applications also offer compelling research opportunities in AI more broadly. For example, physics-informed architectures and hardware development could lead to advances in the speed of AI algorithms, and work in statistical physics is providing a theoretical foundation for understanding AI dynamics.

“Physics has inspired many time-tested ideas in machine learning: maximizing entropy, Boltzmann machines, and variational inference, to name a few,” says Pulkit Agrawal, an assistant professor of electrical engineering and computer science at MIT, and researcher in the Computer Science and Artificial Intelligence Laboratory (CSAIL). “We believe that close interaction between physics and AI researchers will be the catalyst that leads to the next generation of machine learning algorithms.”

Cultivating early-career talent

AI technologies are advancing rapidly, making it both important and challenging to train junior researchers at the intersection of physics and AI. The IAIFI aims to recruit and train a talented and diverse group of early-career researchers, including at the postdoc level through its IAIFI Fellows Program.

“By offering our fellows their choice of research problems, and the chance to focus on cutting-edge challenges in physics and AI, we will prepare many talented young scientists to become future leaders in both academia and industry,” says MIT professor of physics Marin Soljacic of the Research Laboratory of Electronics (RLE).

IAIFI researchers hope these fellows will spark interdisciplinary and multi-investigator collaborations, generate new ideas and approaches, translate physics challenges beyond their native domains, and help develop a common language across disciplines. Applications for the inaugural IAIFI fellows are due in mid-October.

Another related effort spearheaded by Thaler, Williams, and Alexander Rakhlin, an associate professor of brain and cognitive science at MIT and researcher in the Institute for Data, Systems, and Society (IDSS), is the development of a new interdisciplinary PhD program in physics, statistics, and data science, a collaborative effort between the Department of Physics and the Statistics and Data Science Center.

“Statistics and data science are among the foundational pillars of AI. Physics joining the interdisciplinary doctoral program will bring forth new ideas and areas of exploration, while fostering a new generation of leaders at the intersection of physics, statistics, and AI,” says Rakhlin.

Education, outreach, and partnerships 

The IAIFI aims to cultivate “human intelligence” by promoting education and outreach. For example, IAIFI members will contribute to establishing a MicroMasters degree program at MIT for students from non-traditional backgrounds.

“We will increase the number of students in both physics and AI from underrepresented groups by providing fellowships for the MicroMasters program,” says Isaac Chuang, professor of physics and electrical engineering, senior associate dean for digital learning, and RLE researcher at MIT. “We also plan on working with undergraduate MIT Summer Research Program students, to introduce them to the tools of physics and AI research that they might not have access to at their home institutions.”

The IAIFI plans to expand its impact via numerous outreach efforts, including a K-12 program in which students are given data from the LHC and LIGO and tasked with rediscovering the Higgs boson and gravitational waves.

“After confirming these recent Nobel Prizes, we can ask the students to find tiny artificial signals embedded in the data using AI and fundamental physics principles,” says assistant professor of physics Phil Harris, an LNS researcher at MIT. “With projects like this, we hope to disseminate knowledge about — and enthusiasm for — physics, AI, and their intersection.”

In addition, the IAIFI will collaborate with industry and government to advance the frontiers of both AI and physics, as well as societal sectors that stand to benefit from AI innovation. IAIFI members already have many active collaborations with industry partners, including DeepMind, Microsoft Research, and Amazon.

“We will tackle two of the greatest mysteries of science: how our universe works and how intelligence works,” says MIT professor of physics Max Tegmark, an MIT Kavli Institute researcher. “Our key strategy is to link them, using physics to improve AI and AI to improve physics. We’re delighted that the NSF is investing the vital seed funding needed to launch this exciting effort.”

Building new connections at MIT and beyond

Leveraging MIT’s culture of collaboration, the IAIFI aims to generate new connections and to strengthen existing ones across MIT and beyond.

Of the 27 current IAIFI senior investigators, 16 are at MIT and members of the LNS, RLE, MIT Kavli Institute, CSAIL, and IDSS. In addition, IAIFI investigators are members of related NSF-supported efforts at MIT, such as the Center for Brains, Minds, and Machines within the McGovern Institute for Brain Research and the MIT-Harvard Center for Ultracold Atoms.

“We expect a lot of creative synergies as we bring physics and computer science together to study AI,” says Bill Freeman, the Thomas and Gerd Perkins Professor of Electrical Engineering and Computer Science and researcher in CSAIL. “I’m excited to work with my physics colleagues on topics that bridge these fields.”

More broadly, the IAIFI aims to make Cambridge, Massachusetts, and the surrounding Boston area a hub for collaborative efforts to advance both physics and AI.

“As we teach in 8.01 and 8.02, part of what makes physics so powerful is that it provides a universal language that can be applied to a wide range of scientific problems,” says Thaler. “Through the IAIFI, we will create a common language that transcends the intellectual borders between physics and AI to facilitate groundbreaking discoveries.”

Key brain region was “recycled” as humans developed the ability to read

Humans began to develop systems of reading and writing only within the past few thousand years. Our reading abilities set us apart from other animal species, but a few thousand years is much too short a timeframe for our brains to have evolved new areas specifically devoted to reading.

To account for the development of this skill, some scientists have hypothesized that parts of the brain that originally evolved for other purposes have been “recycled” for reading. As one example, they suggest that a part of the visual system that is specialized to perform object recognition has been repurposed for a key component of reading called orthographic processing — the ability to recognize written letters and words.

A new study from MIT neuroscientists offers evidence for this hypothesis. The findings suggest that even in nonhuman primates, who do not know how to read, a part of the brain called the inferotemporal (IT) cortex is capable of performing tasks such as distinguishing words from nonsense words, or picking out specific letters from a word.

“This work has opened up a potential linkage between our rapidly developing understanding of the neural mechanisms of visual processing and an important primate behavior — human reading,” says James DiCarlo, the head of MIT’s Department of Brain and Cognitive Sciences, an investigator in the McGovern Institute for Brain Research and the Center for Brains, Minds, and Machines, and the senior author of the study.

Rishi Rajalingham, an MIT postdoc, is the lead author of the study, which appears in Nature Communications. Other MIT authors are postdoc Kohitij Kar and technical associate Sachi Sanghavi. The research team also includes Stanislas Dehaene, a professor of experimental cognitive psychology at the Collège de France.

Word recognition

Reading is a complex process that requires recognizing words, assigning meaning to those words, and associating words with their corresponding sound. These functions are believed to be spread out over different parts of the human brain.

Functional magnetic resonance imaging (fMRI) studies have identified a region called the visual word form area (VWFA) that lights up when the brain processes a written word. This region is involved in the orthographic stage: It discriminates words from jumbled strings of letters or words from unknown alphabets. The VWFA is located in the IT cortex, a part of the visual cortex that is also responsible for identifying objects.

DiCarlo and Dehaene became interested in studying the neural mechanisms behind word recognition after cognitive psychologists in France reported that baboons could learn to discriminate words from nonwords, in a study that appeared in Science in 2012.

Using fMRI, Dehaene’s lab has previously found that parts of the IT cortex that respond to objects and faces become highly specialized for recognizing written words once people learn to read.

“However, given the limitations of human imaging methods, it has been challenging to characterize these representations at the resolution of individual neurons, and to quantitatively test if and how these representations might be reused to support orthographic processing,” Dehaene says. “These findings inspired us to ask if nonhuman primates could provide a unique opportunity to investigate the neuronal mechanisms underlying orthographic processing.”

The researchers hypothesized that if parts of the primate brain are predisposed to process text, they might be able to find patterns reflecting that in the neural activity of nonhuman primates as they simply look at words.

To test that idea, the researchers recorded neural activity from about 500 neural sites across the IT cortex of macaques as they looked at about 2,000 strings of letters, some of which were English words and some of which were nonsensical strings of letters.

“The efficiency of this methodology is that you don’t need to train animals to do anything,” Rajalingham says. “What you do is just record these patterns of neural activity as you flash an image in front of the animal.”

The researchers then fed that neural data into a simple computer model called a linear classifier. This model learns to combine the inputs from each of the 500 neural sites to predict whether the string of letters that provoked that activity pattern was a word or not. While the animal itself is not performing this task, the model acts as a “stand-in” that uses the neural data to generate a behavior, Rajalingham says.

Using that neural data, the model was able to generate accurate predictions for many orthographic tasks, including distinguishing words from nonwords and determining if a particular letter is present in a string of words. The model was about 70 percent accurate at distinguishing words from nonwords, which is very similar to the rate reported in the 2012 Science study with baboons. Furthermore, the patterns of errors made by model were similar to those made by the animals.

Neuronal recycling

The researchers also recorded neural activity from a different brain area that also feeds into IT cortex: V4, which is part of the visual cortex. When they fed V4 activity patterns into the linear classifier model, the model poorly predicted (compared to IT) the human or baboon performance on the orthographic processing tasks.

The findings suggest that the IT cortex is particularly well-suited to be repurposed for skills that are needed for reading, and they support the hypothesis that some of the mechanisms of reading are built upon highly evolved mechanisms for object recognition, the researchers say.

The researchers now plan to train animals to perform orthographic tasks and measure how their neural activity changes as they learn the tasks.

The research was funded by the Simons Foundation and the U.S. Office of Naval Research.

Full Paper at Nature Communications