What words can convey

From search engines to voice assistants, computers are getting better at understanding what we mean. That’s thanks to language processing programs that make sense of a staggering number of words, without ever being told explicitly what those words mean. Such programs infer meaning instead through statistics—and a new study reveals that this computational approach can assign many kinds of information to a single word, just like the human brain.

The study, published April 14, 2022, in the journal Nature Human Behavior, was co-led by Gabriel Grand, a graduate student at MIT’s Computer Science and Artificial Intelligence Laboratory, and Idan Blank, an assistant professor at the University of California, Los Angeles, and supervised by McGovern Investigator Ev Fedorenko, a cognitive neuroscientist who studies how the human brain uses and understands language, and Francisco Pereira at the National Institute of Mental Health. Fedorenko says the rich knowledge her team was able to find within computational language models demonstrates just how much can be learned about the world through language alone.

Early language models

The research team began its analysis of statistics-based language processing models in 2015, when the approach was new. Such models derive meaning by analyzing how often pairs of words co-occur in texts and using those relationships to assess the similarities of words’ meanings. For example, such a program might conclude that “bread” and “apple” are more similar to one another than they are to “notebook,” because “bread” and “apple” are often found in proximity to words like “eat” or “snack,” whereas “notebook” is not.

The models were clearly good at measuring words’ overall similarity to one another. But most words carry many kinds of information, and their similarities depend on which qualities are being evaluated. “Humans can come up with all these different mental scales to help organize their understanding of words,” explains Grand, a former undergraduate researcher in the Fedorenko lab. For examples, he says, “dolphins and alligators might be similar in size, but one is much more dangerous than the other.”

Grand and Idan Blank, who was then a graduate student at the McGovern Institute, wanted to know whether the models captured that same nuance. And if they did, how was the information organized?

To learn how the information in such a model stacked up to humans’ understanding of words, the team first asked human volunteers to score words along many different scales: Were the concepts those words conveyed big or small, safe or dangerous, wet or dry? Then, having mapped where people position different words along these scales, they looked to see whether language processing models did the same.

Grand explains that distributional semantic models use co-occurrence statistics to organize words into a huge, multidimensional matrix. The more similar words are to one another, the closer they are within that space. The dimensions of the space are vast, and there is no inherent meaning built into its structure. “In these word embeddings, there are hundreds of dimensions, and we have no idea what any dimension means,” he says. “We’re really trying to peer into this black box and say, ‘is there structure in here?’”

Word-vectors in the category ‘animals’ (blue circles) are orthogonally projected (light-blue lines) onto the feature subspace for ‘size’ (red line), defined as the vector difference between large−→−− and small−→−− (red circles). The three dimensions in this figure are arbitrary and were chosen via principal component analysis to enhance visualization (the original GloVe word embedding has 300 dimensions, and projection happens in that space). Image: Fedorenko lab

Specifically, they asked whether the semantic scales they had asked their volunteers use were represented in the model. So they looked to see where words in the space lined up along vectors defined by the extremes of those scales. Where did dolphins and tigers fall on line from “big” to “small,” for example? And were they closer together along that line than they were on a line representing danger (“safe” to “dangerous”)?

Across more than 50 sets of world categories and semantic scales, they found that the model had organized words very much like the human volunteers. Dolphins and tigers were judged to be similar in terms of size, but far apart on scales measuring danger or wetness. The model had organized the words in a way that represented many kinds of meaning—and it had done so based entirely on the words’ co-occurrences.

That, Fedorenko says, tells us something about the power of language. “The fact that we can recover so much of this rich semantic information from just these simple word co-occurrence statistics suggests that this is one very powerful source of learning about things that you may not even have direct perceptual experience with.”

Three from MIT awarded 2022 Paul and Daisy Soros Fellowships for New Americans

MIT graduate student Fernanda De La Torre, alumna Trang Luu ’18, SM ’20, and senior Syamantak Payra are recipients of the 2022 Paul and Daisy Soros Fellowships for New Americans.

De La Torre, Luu, and Payra are among 30 New Americans selected from a pool of over 1,800 applicants. The fellowship honors the contributions of immigrants and children of immigrants by providing $90,000 in funding for graduate school.

Students interested in applying to the P.D. Soros Fellowship for future years may contact Kim Benard, associate dean of distinguished fellowships in Career Advising and Professional Development.

Fernanda De La Torre

Fernanda De La Torre is a PhD student in the Department of Brain and Cognitive Sciences. With Professor Josh McDermott, she studies how we integrate vision and sound, and with Professor Robert Yang, she develops computational models of imagination.

De La Torre spent her early childhood with her younger sister and grandmother in Guadalajara, Mexico. At age 12, she crossed the Mexican border to reunite with her mother in Kansas City, Missouri. Shortly after, an abusive home environment forced De La Torre to leave her family and support herself throughout her early teens.

Despite her difficult circumstances, De La Torre excelled academically in high school. By winning various scholarships that would discretely take applications from undocumented students, she was able to continue her studies in computer science and mathematics at Kansas State University. There, she became intrigued by the mysteries of the human mind. During college, De La Torre received invaluable mentorship from her former high school principal, Thomas Herrera, who helped her become documented through the Violence Against Women Act. Her college professor, William Hsu, supported her interests in artificial intelligence and encouraged her to pursue a scientific career.

After her undergraduate studies, De La Torre won a post-baccalaureate fellowship from the Department of Brain and Cognitive Sciences at MIT, where she worked with Professor Tomaso Poggio on the theory of deep learning. She then transitioned into the department’s PhD program. Beyond contributing to scientific knowledge, De La Torre plans to use science to create spaces where all people, including those from backgrounds like her own, can innovate and thrive.

She says: “Immigrants face many obstacles, but overcoming them gives us a unique strength: We learn to become resilient, while relying on friends and mentors. These experiences foster both the desire and the ability to pay it forward to our community.”

Trang Luu

Trang Luu graduated from MIT with a BS in mechanical engineering in 2018, and a master of engineering degree in 2020. Her Soros award will support her graduate studies at Harvard University in the MBA/MS engineering sciences program.

Born in Saigon, Vietnam, Luu was 3 when her family immigrated to Houston, Texas. Watching her parents’ efforts to make a living in a land where they did not understand the culture or speak the language well, Luu wanted to alleviate hardship for her family. She took full responsibility for her education and found mentors to help her navigate the American education system. At home, she assisted her family in making and repairing household items, which fueled her excitement for engineering.

As an MIT undergraduate, Luu focused on assistive technology projects, applying her engineering background to solve problems impeding daily living. These projects included a new adaptive socket liner for below-the-knee amputees in Kenya, Ethiopia, and Thailand; a walking stick adapter for wheelchairs; a computer head pointer for patients with limited arm mobility, a safer makeshift cook stove design for street vendors in South Africa; and a quicker method to test new drip irrigation designs. As a graduate student in MIT D-Lab under the direction of Professor Daniel Frey, Luu was awarded a National Science Foundation Graduate Research Fellowship. In her graduate studies, Luu researched methods to improve evaporative cooling devices for off-grid farmers to reduce rapid fruit and vegetable deterioration.

These projects strengthened Luu’s commitment to innovating new technology and devices for people struggling with basic daily tasks. During her senior year, Luu collaborated on developing a working prototype of a wearable device that noninvasively reduces hand tremors associated with Parkinson’s disease or essential tremor. Observing patients’ joy after their tremors stopped compelled Luu and three co-founders to continue developing the device after college. Four years later, Encora Therapeutics has accomplished major milestones, including Breakthrough Device designation by the U.S. Food and Drug Administration.

Syamantak Payra

Hailing from Houston, Texas, Syamantak Payra is a senior majoring in electrical engineering and computer science, with minors in public policy and entrepreneurship and innovation. He will be pursuing a PhD in engineering at Stanford University, with the goal of creating new biomedical devices that can help improve daily life for patients worldwide and enhance health care outcomes for decades to come.

Payra’s parents had emigrated from India, and he grew up immersed in his grandparents’ rich Bengali culture. As a high school student, he conducted projects with NASA engineers at Johnson Space Center, experimented at home with his scientist parents, and competed in spelling bees and science fairs across the United States. Through these avenues and activities, Syamantak not only gained perspectives on bridging gaps between people, but also found passions for language, scientific discovery, and teaching others.

After watching his grandmother struggle with asthma and chronic obstructive pulmonary disease and losing his baby brother to brain cancer, Payra devoted himself to trying to use technology to solve health-care challenges. Payra’s proudest accomplishments include building a robotic leg brace for his paralyzed teacher and conducting free literacy workshops and STEM outreach programs that reached nearly a thousand underprivileged students across the Greater Houston Area.

At MIT, Payra has worked in Professor Yoel Fink’s research laboratory, creating digital sensor fibers that have been woven into intelligent garments that can assist in diagnosing illnesses, and in Professor Joseph Paradiso’s research laboratory, where he contributed to next-generation spacesuit prototypes that better protect astronauts on spacewalks. Payra’s research has been published by multiple scientific journals, and he was inducted into the National Gallery of America’s Young Inventors.

An optimized solution for face recognition

The human brain seems to care a lot about faces. It’s dedicated a specific area to identifying them, and the neurons there are so good at their job that most of us can readily recognize thousands of individuals. With artificial intelligence, computers can now recognize faces with a similar efficiency—and neuroscientists at MIT’s McGovern Institute have found that a computational network trained to identify faces and other objects discovers a surprisingly brain-like strategy to sort them all out.

The finding, reported March 16, 2022, in Science Advances, suggests that the millions of years of evolution that have shaped circuits in the human brain have optimized our system for facial recognition.

“The human brain’s solution is to segregate the processing of faces from the processing of objects,” explains Katharina Dobs, who led the study as a postdoctoral researcher in McGovern investigator Nancy Kanwisher’s lab. The artificial network that she trained did the same. “And that’s the same solution that we hypothesize any system that’s trained to recognize faces and to categorize objects would find,” she adds.

“These two completely different systems have figured out what a—if not the—good solution is. And that feels very profound,” says Kanwisher.

Functionally specific brain regions

More than twenty years ago, Kanwisher’s team discovered a small spot in the brain’s temporal lobe that responds specifically to faces. This region, which they named the fusiform face area, is one of many brain regions Kanwisher and others have found that are dedicated to specific tasks, such as the detection of written words, the perception of vocal songs, and understanding language.

Kanwisher says that as she has explored how the human brain is organized, she has always been curious about the reasons for that organization. Does the brain really need special machinery for facial recognition and other functions? “‘Why questions’ are very difficult in science,” she says. But with a sophisticated type of machine learning called a deep neural network, her team could at least find out how a different system would handle a similar task.

Dobs, who is now a research group leader at Justus Liebig University Giessen in Germany, assembled hundreds of thousands of images with which to train a deep neural network in face and object recognition. The collection included the faces of more than 1,700 different people and hundreds of different kinds of objects, from chairs to cheeseburgers. All of these were presented to the network, with no clues about which was which. “We never told the system that some of those are faces, and some of those are objects. So it’s basically just one big task,” Dobs says. “It needs to recognize a face identity, as well as a bike or a pen.”

Visualization of the preferred stimulus for example face-ranked filters. While filters in early layers (e.g., Conv5) were maximally activated by simple features, filters responded to features that appear somewhat like face parts (e.g., nose and eyes) in mid-level layers (e.g., Conv9) and appear to represent faces in a more holistic manner in late convolutional layers. Image: Kanwisher lab

As the program learned to identify the objects and faces, it organized itself into an information-processing network with that included units specifically dedicated to face recognition. Like the brain, this specialization occurred during the later stages of image processing. In both the brain and the artificial network, early steps in facial recognition involve more general vision processing machinery, and final stages rely on face-dedicated components.

It’s not known how face-processing machinery arises in a developing brain, but based on their findings, Kanwisher and Dobs say networks don’t necessarily require an innate face-processing mechanism to acquire that specialization. “We didn’t build anything face-ish into our network,” Kanwisher says. “The networks managed to segregate themselves without being given a face-specific nudge.”

Kanwisher says it was thrilling seeing the deep neural network segregate itself into separate parts for face and object recognition. “That’s what we’ve been looking at in the brain for twenty-some years,” she says. “Why do we have a separate system for face recognition in the brain? This tells me it is because that is what an optimized solution looks like.”

Now, she is eager to use deep neural nets to ask similar questions about why other brain functions are organized the way they are. “We have a new way to ask why the brain is organized the way it is,” she says. “How much of the structure we see in human brains will arise spontaneously by training networks to do comparable tasks?”

School of Engineering welcomes new faculty

The School of Engineering is welcoming 17 new faculty members to its departments, institutes, labs, and centers. With research and teaching activities ranging from the development of robotics and machine learning technologies to modeling the impact of elevated carbon dioxide levels on vegetation, they are poised to make significant contributions in new directions across the school and to a wide range of research efforts around the Institute.

“I am delighted to welcome our wonderful new faculty,” says Anantha Chandrakasan, dean of the MIT School of Engineering and Vannevar Bush Professor of Electrical Engineering and Computer Science. “Their impact as talented educators, researchers, collaborators, and mentors will be felt across the School of Engineering and beyond as they strengthen our engineering community.”

Among the new faculty members are four from the Department of Electrical Engineering and Computer Science (EECS), which jointly reports into the School of Engineering and the MIT Stephen A. Schwarzman College of Computing.

Iwnetim “Tim” Abate will join the Department of Materials Science and Engineering in July 2023. He is currently both a Miller and Presidential Postdoctoral Fellow at the University of California at Berkeley. He received his MS and PhD in materials science and engineering from Stanford University and BS in physics from Minnesota State University at Moorhead. He also has research experience in industry (IBM) and at national labs (Los Alamos and SLAC National Accelerator Laboratories). Utilizing computational and experimental approaches in tandem, his research program at MIT will focus on the intersection of material chemistry, electrochemistry, and condensed matter physics to develop solutions for climate change and smart agriculture, including next-generation battery and sensor devices. Abate is also a co-founder and president of a nonprofit organization, SciFro Inc., working on empowering the African youth and underrepresented minorities in the United States to solve local problems through scientific research and innovation. He will continue working on expanding the vision and impact of SciFro with the MIT community. Abate received the Dan Cubicciotti Award of the Electrochemical Society, the EDGE and DARE graduate fellowships, the United Technologies Research Center fellowship, the John Stevens Jr. Memorial Award and the Justice, Equity, Diversity and Inclusion Graduation Award from Stanford University. He will hold the Toyota Career Development Professorship at MIT.

Kaitlyn Becker will join the Department of Mechanical Engineering as an assistant professor in August 2022. Becker received her PhD in materials science and mechanical engineering from Harvard University in 2021 and previously worked in industry as a manufacturing engineer at Cameron Health and a senior engineer for Nano Terra, Inc. She is a postdoc at the Harvard University School of Engineering and Applied Sciences and is also currently a senior glassblowing instructor in the Department of Materials Science and Engineering at MIT. Becker works on adaptive soft robots for grasping and manipulation of delicate structures from the desktop to the deep sea. Her research focuses on novel soft robotic platforms, adding functionality through innovations at the intersection of design and fabrication. She has developed novel fabrication methodologies and mechanical programming methods for large integrated arrays of soft actuators capable of collective manipulation and locomotion, and demonstrated integration of microfluidic circuits to control arrays of multichannel, two-degrees-of-freedom soft actuators. Becker received the National Science Foundation Graduate Research Fellowship in 2015, the Microsoft Graduate Women’s Scholarship in 2015, the Winston Chen Graduate Fellowship in 2015, and the Courtlandt S. Gross Memorial Scholarship in 2014.

Brandon J. DeKosky joined the Department of Chemical Engineering as an assistant professor in a newly introduced joint faculty position between the department and the Ragon Institute of MGH, MIT, and Harvard in September 2021. He received his BS in chemical engineering from University of Kansas and his PhD in chemical engineering from the University of Texas at Austin. He then did postdoctoral research at the Vaccine Research Center of the National Institute of Infectious Diseases. In 2017, Brandon launched his independent academic career as an assistant professor at the University of Kansas in a joint position with the Department of Chemical Engineering and the Department of Pharmaceutical Chemistry. He was also a member of the bioengineering graduate program. His research program focuses on developing and applying a suite of new high-throughput experimental and computational platforms for molecular analysis of adaptive immune responses, to accelerate precision drug discovery. He has received several notable recognitions, which include receipt of the NIH K99 Path to Independence and NIH DP5 Early Independence awards, the Cellular and Molecular Bioengineering Rising Star Award from the Biomedical Engineering Society, and the Career Development Award from the Congressionally Directed Medical Research Program’s Peer Reviewed Cancer Research Program.

Mohsen Ghaffari will join the Department of Electrical Engineering and Computer Science in April 2022. He received his BS from the Sharif University of Technology, and his MS and PhD in EECS from MIT. His research focuses on distributed and parallel algorithms for large graphs. Ghaffari received the ACM Doctoral Dissertation Honorable Mention Award, the ACM-EATCS Principles of Distributed Computing Doctoral Dissertation Award, and the George M. Sprowls Award for Best Computer Science PhD thesis at MIT. Before coming to MIT, he was on the faculty at ETH Zurich, where he received a prestigious European Research Council Starting Grant.

Aristide Gumyusenge joined the Department of Materials Science and Engineering in January. He is currently a postdoc at Stanford University working with Professor Zhenan Bao and Professor Alberto Salleo. He received a BS in chemistry from Wofford College in 2015 and a PhD in chemistry from Purdue University in 2019. His research background and interests are in semiconducting polymers, their processing and characterization, and their unique role in the future of electronics. Particularly, he has tackled longstanding challenges in operation stability of semiconducting polymers under extreme heat and has pioneered high-temperature plastic electronics. He has been selected as a PMSE Future Faculty Scholar (2021), the GLAM Postdoctoral Fellow (2020-22), and the MRS Arthur Nowick and Graduate Student Gold Awardee (2019), among other recognitions. At MIT, he will lead the Laboratory of Organic Materials for Smart Electronics (OMSE Lab). Through polymer design, novel processing strategies, and large-area manufacturing of electronic devices, he is interested in relating molecular design to device performance, especially transistor devices able to mimic and interface with biological systems. He will hold the Merton C. Flemings Career Development Professorship.

Mina Konakovic Lukovic will join the Department of Electrical Engineering and Computer Science as an assistant professor in July 2022. She received her BS and MS from the University of Belgrade, Faculty of Mathematics. She earned her PhD in 2019 in the School of Computer and Communication Sciences at the Swiss Federal Institute of Technology Lausanne, advised by Professor Mark Pauly. Currently a Schmidt Science Postdoctoral Fellow in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), she has been mentored by Professor Wojciech Matusik. Her research focuses on computer graphics, computational fabrication, 3D geometry processing, and machine learning, including architectural geometry and the design of programmable materials. She received the ACM SIGGRAPH Outstanding Doctoral Dissertation Honorable Mention, the Eurographics PhD Award, and was recently awarded the 2021 SIAM Activity Group on Geometric Design Early Career Prize.

Darcy McRose will join the Department of Civil and Environmental Engineering as an assistant professor in August 2022. She completed a BS in Earth systems at Stanford and a PhD in geosciences at Princeton University. Darcy is currently conducting postdoctoral work at Caltech, where she is mentored by Professor Dianne Newman in the divisions of Biology and Biological Engineering and Geological and Planetary Sciences. Her research program focuses on microbe-environment interactions and their effects on biogeochemical cycles, and incorporates techniques ranging from microbial physiology and genetics to geochemistry. A particular emphasis for this work is the production and use of secondary metabolites and small molecules in soils and sediments. McRose received the Caltech BBE Division postdoctoral fellowship in 2019 and is currently a Simons Foundation Marine Microbial Ecology postdoctoral fellow as well as a L’Oréal USA for Women in Science fellow.

Qin (Maggie) Qi joined the Department of Chemical Engineering as an assistant professor in January 2022. She received two BS degrees in chemical engineering and in operations research from Cornell University, before moving on to Stanford for her PhD. She then took on a postdoc position at Harvard University School of Engineering and Applied Sciences and the Wyss Institute. Maggie’s proposed research includes combining extensive theoretical and computational work on predictive models that guide experimental design. She seeks to investigate particle-cell biomechanics and function for better targeted cell-based therapies. She also plans to design microphysiological systems that elucidate hydrodynamics in complex organs, including delivery of drugs to the eye, and to examine ionic liquids as complex fluids for biomaterial design. She aims to push the boundaries of fluid mechanics, transport phenomena, and soft matter for human health and to innovate precision health care solutions. Maggie received the T.S. Lo Graduate Fellowship and the Stanford Graduate Fellowship in Science and Engineering. Among her accomplishments, Maggie was a participant in the inaugural class of the MIT Rising Stars in ChemE Program in 2018.

Manish Raghavan will join the MIT Sloan School of Management and the Department of Electrical Engineering and Computer Science as an assistant professor in September 2022. He shares a joint appointment with the MIT Schwarzman College of Computing. He received a bachelor’s degree in electrical engineering and computer science from the University of California at Berkeley, and PhD from the Computer Science department at Cornell University. Prior to joining MIT, he was a postdoc at the Harvard Center for Research on Computation and Society. His research interests lie in the application of computational techniques to domains of social concern, including algorithmic fairness and behavioral economics, with a particular focus on the use of algorithmic tools in the hiring pipeline. He is also a member of Cornell’s Artificial Intelligence, Policy, and Practice initiative and Mechanism Design for Social Good.

Ritu Raman joined the Department of Mechanical Engineering as an assistant professor and Brit and Alex d’Arbeloff Career Development Chair in August 2021. Raman received her PhD in mechanical engineering from the University of Illinois at Urbana-Champaign as an NSF Graduate Research Fellow in 2016 and completed a postdoctoral fellowship with Professor Robert Langer at MIT, funded by a NASEM Ford Foundation Fellowship and a L’Oréal USA For Women in Science Fellowship. Raman’s lab designs adaptive living materials powered by assemblies of living cells for applications ranging from medicine to machines. Currently, she is focused on using biological materials and engineering tools to build living neuromuscular tissues. Her goal is to help restore mobility to those who have lost it after disease or trauma and to deploy biological actuators as functional components in machines. Raman published the book Biofrabrication with MIT Press in September 2021. She was in the MIT Technology Review “35 Innovators Under 35” 2019 class, the Forbes “30 Under 30” 2018 class, and has received numerous awards including being named a National Academy of Sciences Kavli Frontiers of Science Fellow in 2020 and receiving the Science and Sartorius Prize for Regenerative Medicine and Cell Therapy in 2019. Ritu has championed many initiatives to empower women in science, including being named an AAAS IF/THEN ambassador and founding the Women in Innovation and Stem Database at MIT (WISDM).

Nidhi Seethapathi joined the Department of Brain and Cognitive Sciences and the Department of Electrical Engineering and Computer Science in January 2022. She shares a joint appointment with the MIT Schwarzman College of Computing. She received a bachelor’s degree in mechanical engineering from Veermata Jijabai Technological Institute and a PhD from the Movement Lab at Ohio State University. Her research interests include building computational predictive models of human movement with applications to autonomous and robot-aided neuromotor rehabilitation. In her work, she uses a combination of tools and approaches from dynamics, control theory, and machine learning. During her PhD, she was a Schlumberger Foundation Faculty for the Future Fellow. She then worked as a postdoc in the Kording Lab at University of Pennsylvania, developing data-driven tools for autonomous neuromotor rehabilitation, in collaboration with the Rehabilitation Robotics Lab.

Vincent Sitzmann will join the Department of Electrical Engineering and Computer Science as an assistant professor in July 2022. He earned his BS from the Technical University of Munich in 2015, his MS from Stanford in 2017, and his PhD from Stanford in 2020. At MIT, he will be the principal investigator of the Scene Representation Group, where he will lead research at the intersection of machine learning, graphics, neural rendering, and computer vision to build algorithms that learn to reconstruct, understand, and interact with 3D environments from incomplete observations the way humans can. Currently, Vincent is a postdoc at the MIT Computer Science and Artificial Intelligence Laboratory with Josh Tenenbaum, Bill Freeman, and Fredo Durand. Along with multiple scholarships and fellowships, he has been recognized with the NeurIPS Honorable Mention: Outstanding New Directions in 2019.

Tess Smidt joined the Department of Electrical Engineering and Computer Science as an assistant professor in September 2021. She earned her SB in physics from MIT in 2012 and her PhD in physics from the University of California at Berkeley in 2018. She is the principal investigator of the Atomic Architects group at the Research Laboratory of Electronics, where she works at the intersection of physics, geometry, and machine learning to design algorithms that aid in the understanding and design of physical systems. Her research focuses on machine learning that incorporates physical and geometric constraints, with applications to materials design. Prior to joining the MIT EECS faculty, she was the 2018 Alvarez Postdoctoral Fellow in Computing Sciences at Lawrence Berkeley National Laboratory and a software engineering intern on the Google Accelerated Sciences team, where she developed Euclidean symmetry equivariant neural networks which naturally handle 3D geometry and geometric tensor data.

Loza Tadesse will join the Department of Mechanical Engineering as an assistant professor in July 2023. She received her PhD in bioengineering from Stanford University in 2021 and previously was a medical student at St. Paul Hospital Millennium Medical College in Ethiopia. She is currently a postdoc at the University of California at Berkeley. Tadesse’s past research combines Raman spectroscopy and machine learning to develop a rapid, all-optical, and label-free bacterial diagnostic and antibiotic susceptibility testing system that aims to circumvent the time-consuming culturing step in “gold standard” methods. She aims to establish a research program that develops next-generation point-of-care diagnostic devices using spectroscopy, optical, and machine learning tools for application in resource limited clinical settings such as developing nations, military sites, and space exploration. Tadesse has been listed as a 2022 Forbes “30 Under 30” in health care, received many awards including the Biomedical Engineering Society (BMES) Career Development Award, the Stanford DARE Fellowship and the Gates Foundation “Call to Action” $200,000 grant for SciFro Inc., an educational nonprofit in Ethiopia, which she co-founded.

César Terrer joined the Department of Civil and Environmental Engineering as an assistant professor in July 2021. He obtained his PhD in ecosystem ecology and climate change from Imperial College London, where he started working at the interface between experiments and models to better understand the effects of elevated carbon dioxide on vegetation. His research has advanced the understanding on the effects of carbon dioxide in terrestrial ecosystems, the role of soil nutrients in a climate change context, and plant-soil interactions. Synthesizing observational data from carbon dioxide experiments and satellites through meta-analysis and machine learning, César has found that microbial interactions between plants and soils play a major role in the carbon cycle at a global scale, affecting the speed of global warming.

Haruko Wainwright joined the Department of Nuclear Science and Engineering as an assistant professor in January 2021. She received her BEng in engineering physics from Kyoto University, Japan in 2003, her MS in nuclear engineering in 2006, her MA in statistics in 2010, and her PhD in nuclear engineering in 2010 from University of California at Berkeley. Before joining MIT, she was a staff scientist in the Earth and Environmental Sciences Area at Lawrence Berkeley National Laboratory and an adjunct professor in nuclear engineering at UC Berkeley. Her research focuses on environmental modeling and monitoring technologies, with a particular emphasis on nuclear waste and nuclear-related contamination. She has been developing Bayesian methods for multi-type multiscale data integration and model-data integration. She leads and co-leads multiple interdisciplinary projects, including the U.S. Department of Energy’s Advanced Long-term Environmental Monitoring Systems (ALTEMIS) project, and the Artificial Intelligence for Earth System Predictability (AI4ESP) initiative.

Martin Wainwright will join the Department of Electrical Engineering and Computer Science in July 2022. He received a bachelor’s degree in mathematics from University of Waterloo, Canada, and PhD in EECS from MIT. Prior to joining MIT, he was the Chancellor’s Professor at the University of California at Berkeley, with a joint appointment between the Department of Statistics and the Department of EECS. His research interests include high-dimensional statistics, statistical machine learning, information theory, and optimization theory. Among other awards, he has received the COPSS Presidents’ Award (2014) from the Joint Statistical Societies, the David Blackwell Lectureship (2017), and Medallion Lectureship (2013) from the Institute of Mathematical Statistics, and Best Paper awards from the IEEE Signal Processing Society and IEEE Information Theory Society. He was a Section Lecturer at the International Congress of Mathematicians in 2014.

 

Seven new faculty join the MIT School of Science

This winter, seven new faculty members join the MIT School of Science in the departments of Biology and Brain and Cognitive Sciences.

Siniša Hrvatin studies how animals initiate, regulate, and survive states of stasis, such as torpor and hibernation. To survive extreme environments, many animals have evolved the ability to decrease metabolic rate and body temperature and enter dormant states. His long-term goal is to harness the potential of these biological adaptations to advance medicine. Previously, he identified the neurons that regulate mouse torpor and established a platform for the development of cell-type-specific viral drivers.
Hrvatin earned his bachelor’s degree in biochemical sciences in 2007 and his PhD in stem cell and regenerative medicine in 2013, both from Harvard University. He was then a postdoc in bioengineering at MIT and a postdoc in neurobiology at Harvard Medical School. Hrvatin returns to MIT as an assistant professor of biology and a member of the Whitehead Institute for Biomedical Research.

Sara Prescott investigates how sensory inputs from within the body control mammalian physiology and behavior. Specifically, she uses mammalian airways as a model system to explore how the cells that line the surface of the body communicate with parts of the nervous system. For example, what mechanisms elicit a reflexive cough? Prescott’s research considers the critical questions of how airway insults are detected, encoded, and adapted to mammalian airways with the ultimate goal of providing new ways to treat autonomic dysfunction.

Prescott earned her bachelor’s degree in molecular biology from Princeton University in 2008 followed by her PhD in developmental biology from Stanford University in 2016. Prior to joining MIT, she was a postdoc at Harvard Medical School and Howard Hughes Medical Institute. The Department of Biology welcomes Prescott as an assistant professor.
Alison Ringel is a T-cell immunologist with a background in biochemistry, biophysics, and structural biology. She investigates how environmental factors such as aging, metabolism, and diet impact tumor progress and the immune responses that cause tumor control. By mapping the environment around a tumor on a cellular level, she seeks to gain a molecular understanding of cancer risk factors.

Ringel received a bachelor’s degree in molecular biology, biochemistry, and physics from Wesleyan University, then a PhD in molecular biophysics from John Hopkins University School of Medicine. Previously, Ringel was a postdoc in the Department of Cell Biology at Harvard Medical School. She joins MIT as an assistant professor in the Department of Biology and a core member of the Ragon Institute of MGH, MIT and Harvard.

Francisco J. Sánchez-Rivera PhD ’16 investigates genetic variation with a focus on cancer. He integrates genome engineering technologies, genetically-engineered mouse models (GEMMs), and single cell lineage tracing and omics approaches in order to understand the mechanics of cancer development and evolution. With state-of-the-art technologies — including a CRISPR-based genome editing system he developed as a graduate student at MIT — he hopes to make discoveries in cancer genetics that will shed light on disease progression and pave the way for better therapeutic treatments.

Sánchez-Rivera received his bachelor’s degree in microbiology from the University of Puerto Rico at Mayagüez followed by a PhD in biology from MIT. He then pursued postdoctoral studies at Memorial Sloan Kettering Cancer Center supported by a HHMI Hanna Gray Fellowship. Sánchez-Rivera returns to MIT as an assistant professor in the Department of Biology and a member of the Koch Institute for Integrative Cancer Research at MIT.

Nidhi Seethapathi builds predictive models to help understand human movement with a combination of theory, computational modeling, and experiments. Her research focuses on understanding the objectives that govern movement decisions, the strategies used to execute movement, and how new movements are learned. By studying movement in real-world contexts using creative approaches, Seethapathi aims to make discoveries and develop tools that could improve neuromotor rehabilitation.

Seethapathi earned her bachelor’s degree in mechanical engineering from the Veermata Jijabai Technological Institute followed by her PhD in mechanical engineering from Ohio State University. In 2018, she continued to the University of Pennsylvania where she was a postdoc. She joins MIT as an assistant professor in the Department of Brain and Cognitive Sciences with a shared appointment in the Department of Electrical Engineering and Computer Science at the MIT Schwarzman College of Computing.

Hernandez Moura Silva researches how the immune system supports tissue physiology. Silva focuses on macrophages, a type of immune cell involved in tissue homeostasis. He plans to establish new strategies to explore the effects and mechanisms of such immune-related pathways, his research ultimately leading to the development of therapeutic approaches to treat human diseases.

Silva earned a bachelor’s degree in biological sciences and a master’s degree in molecular biology from the University of Brasilia. He continued to complete a PhD in immunology at the University of São Paulo School of Medicine: Heart Institute. Most recently, he acted as the Bernard Levine Postdoctoral Fellow in immunology and immuno-metabolism at the New York University School of Medicine: Skirball Institute of Biomolecular Medicine. Silva joins MIT as an assistant professor in the Department of Biology and a core member of the Ragon Institute.

Yadira Soto-Feliciano PhD ’16 studies chromatin — the complex of DNA and proteins that make up chromosomes. She combines cancer biology and epigenetics to understand how certain proteins affect gene expression and, in turn, how they impact the development of cancer and other diseases. In decoding the chemical language of chromatin, Soto-Feliciano pursues a basic understanding of gene regulation that could improve the clinical management of diseases associated with their dysfunction.

Soto-Feliciano received her bachelor’s degree in chemistry from the University of Puerto Rico at Mayagüez followed by a PhD in biology from MIT, where she was also a research fellow with the Koch Institute. Most recently, she was the Damon Runyon-Sohn Pediatric Cancer Postdoctoral Fellow at The Rockefeller University. Soto-Feliciano returns to MIT as an assistant professor in the Department of Biology and a member of the Koch Institute.

Where did that sound come from?

The human brain is finely tuned not only to recognize particular sounds, but also to determine which direction they came from. By comparing differences in sounds that reach the right and left ear, the brain can estimate the location of a barking dog, wailing fire engine, or approaching car.

MIT neuroscientists have now developed a computer model that can also perform that complex task. The model, which consists of several convolutional neural networks, not only performs the task as well as humans do, it also struggles in the same ways that humans do.

“We now have a model that can actually localize sounds in the real world,” says Josh McDermott, an associate professor of brain and cognitive sciences and a member of MIT’s McGovern Institute for Brain Research. “And when we treated the model like a human experimental participant and simulated this large set of experiments that people had tested humans on in the past, what we found over and over again is it the model recapitulates the results that you see in humans.”

Findings from the new study also suggest that humans’ ability to perceive location is adapted to the specific challenges of our environment, says McDermott, who is also a member of MIT’s Center for Brains, Minds, and Machines.

McDermott is the senior author of the paper, which appears today in Nature Human Behavior. The paper’s lead author is MIT graduate student Andrew Francl.

Modeling localization

When we hear a sound such as a train whistle, the sound waves reach our right and left ears at slightly different times and intensities, depending on what direction the sound is coming from. Parts of the midbrain are specialized to compare these slight differences to help estimate what direction the sound came from, a task also known as localization.

This task becomes markedly more difficult under real-world conditions — where the environment produces echoes and many sounds are heard at once.

Scientists have long sought to build computer models that can perform the same kind of calculations that the brain uses to localize sounds. These models sometimes work well in idealized settings with no background noise, but never in real-world environments, with their noises and echoes.

To develop a more sophisticated model of localization, the MIT team turned to convolutional neural networks. This kind of computer modeling has been used extensively to model the human visual system, and more recently, McDermott and other scientists have begun applying it to audition as well.

Convolutional neural networks can be designed with many different architectures, so to help them find the ones that would work best for localization, the MIT team used a supercomputer that allowed them to train and test about 1,500 different models. That search identified 10 that seemed the best-suited for localization, which the researchers further trained and used for all of their subsequent studies.

To train the models, the researchers created a virtual world in which they can control the size of the room and the reflection properties of the walls of the room. All of the sounds fed to the models originated from somewhere in one of these virtual rooms. The set of more than 400 training sounds included human voices, animal sounds, machine sounds such as car engines, and natural sounds such as thunder.

The researchers also ensured the model started with the same information provided by human ears. The outer ear, or pinna, has many folds that reflect sound, altering the frequencies that enter the ear, and these reflections vary depending on where the sound comes from. The researchers simulated this effect by running each sound through a specialized mathematical function before it went into the computer model.

“This allows us to give the model the same kind of information that a person would have,” Francl says.

After training the models, the researchers tested them in a real-world environment. They placed a mannequin with microphones in its ears in an actual room and played sounds from different directions, then fed those recordings into the models. The models performed very similarly to humans when asked to localize these sounds.

“Although the model was trained in a virtual world, when we evaluated it, it could localize sounds in the real world,” Francl says.

Similar patterns

The researchers then subjected the models to a series of tests that scientists have used in the past to study humans’ localization abilities.

In addition to analyzing the difference in arrival time at the right and left ears, the human brain also bases its location judgments on differences in the intensity of sound that reaches each ear. Previous studies have shown that the success of both of these strategies varies depending on the frequency of the incoming sound. In the new study, the MIT team found that the models showed this same pattern of sensitivity to frequency.

“The model seems to use timing and level differences between the two ears in the same way that people do, in a way that’s frequency-dependent,” McDermott says.

The researchers also showed that when they made localization tasks more difficult, by adding multiple sound sources played at the same time, the computer models’ performance declined in a way that closely mimicked human failure patterns under the same circumstances.

“As you add more and more sources, you get a specific pattern of decline in humans’ ability to accurately judge the number of sources present, and their ability to localize those sources,” Francl says. “Humans seem to be limited to localizing about three sources at once, and when we ran the same test on the model, we saw a really similar pattern of behavior.”

Because the researchers used a virtual world to train their models, they were also able to explore what happens when their model learned to localize in different types of unnatural conditions. The researchers trained one set of models in a virtual world with no echoes, and another in a world where there was never more than one sound heard at a time. In a third, the models were only exposed to sounds with narrow frequency ranges, instead of naturally occurring sounds.

When the models trained in these unnatural worlds were evaluated on the same battery of behavioral tests, the models deviated from human behavior, and the ways in which they failed varied depending on the type of environment they had been trained in. These results support the idea that the localization abilities of the human brain are adapted to the environments in which humans evolved, the researchers say.

The researchers are now applying this type of modeling to other aspects of audition, such as pitch perception and speech recognition, and believe it could also be used to understand other cognitive phenomena, such as the limits on what a person can pay attention to or remember, McDermott says.

The research was funded by the National Science Foundation and the National Institute on Deafness and Other Communication Disorders.

Perfecting pitch perception

New research from MIT neuroscientists suggest that natural soundscapes have shaped our sense of hearing, optimizing it for the kinds of sounds we most often encounter.

Mark Saddler, graduate fellow of the K. Lisa Yang Integrative Computational Neuroscience Center. Photo: Caitlin Cunningham

In a study reported December 14 in the journal Nature Communications, researchers led by McGovern Institute Associate Investigator Josh McDermott used computational modeling to explore factors that influence how humans hear pitch. Their model’s pitch perception closely resembled that of humans—but only when it was trained using music, voices, or other naturalistic sounds.

Humans’ ability to recognize pitch—essentially, the rate at which a sound repeats—gives melody to music and nuance to spoken language. Although this is arguably the best-studied aspect of human hearing, researchers are still debating which factors determine the properties of pitch perception, and why it is more acute for some types of sounds than others. McDermott, who is also an associate professor in MIT’s Department of Brain and Cognitive Sciences and an investigator with the Center for Brains Minds and Machines (CBMM), is particularly interested in understanding how our nervous system perceives pitch because cochlear implants, which send electrical signals about sound to the brain in people with profound deafness, don’t replicate this aspect of human hearing very well.

“Cochlear implants can do a pretty good job of helping people understand speech, especially if they’re in a quiet environment. But they really don’t reproduce the percept of pitch very well,” says Mark Saddler, a CBMM graduate student who co-led the project and an inaugural graduate fellow of the K. Lisa Yang Integrative Computational Neuroscience Center. “One of the reasons it’s important to understand the detailed basis of pitch perception in people with normal hearing is to try to get better insights into how we would reproduce that artificially in a prosthesis.”

Artificial hearing

Pitch perception begins in the cochlea, the snail-shaped structure in the inner ear where vibrations from sounds are transformed into electrical signals and relayed to the brain via the auditory nerve. The cochlea’s structure and function help determine how and what we hear. And although it hasn’t been possible to test this idea experimentally, McDermott’s team suspected our “auditory diet” might shape our hearing as well.

To explore how both our ears and our environment influence pitch perception, McDermott, Saddler and research assistant Ray Gonzalez built a computer model called a deep neural network. Neural networks are a type of machine learning model widely used in automatic speech recognition and other artificial intelligence applications. Although the structure of an artificial neural network coarsely resembles the connectivity of neurons in the brain, the models used in engineering applications don’t actually hear the same way humans do—so the team developed a new model to reproduce human pitch perception. Their approach combined an artificial neural network with an existing model of the mammalian ear, uniting the power of machine learning with insights from biology. “These new machine learning models are really the first that can be trained to do complex auditory tasks and actually do them well, at human levels of performance,” Saddler explains.

The researchers trained the neural network to estimate pitch by asking it to identify the repetition rate of sounds in a training set. This gave them the flexibility to change the parameters under which pitch perception developed. They could manipulate the types of sound they presented to the model, as well as the properties of the ear that processed those sounds before passing them on to the neural network.

When the model was trained using sounds that are important to humans, like speech and music, it learned to estimate pitch much as humans do. “We very nicely replicated many characteristics of human perception…suggesting that it’s using similar cues from the sounds and the cochlear representation to do the task,” Saddler says.

But when the model was trained using more artificial sounds or in the absence of any background noise, its behavior was very different. For example, Saddler says, “If you optimize for this idealized world where there’s never any competing sources of noise, you can learn a pitch strategy that seems to be very different from that of humans, which suggests that perhaps the human pitch system was really optimized to deal with cases where sometimes noise is obscuring parts of the sound.”

The team also found the timing of nerve signals initiated in the cochlea to be critical to pitch perception. In a healthy cochlea, McDermott explains, nerve cells fire precisely in time with the sound vibrations that reach the inner ear. When the researchers skewed this relationship in their model, so that the timing of nerve signals was less tightly correlated to vibrations produced by incoming sounds, pitch perception deviated from normal human hearing. 

McDermott says it will be important to take this into account as researchers work to develop better cochlear implants. “It does very much suggest that for cochlear implants to produce normal pitch perception, there needs to be a way to reproduce the fine-grained timing information in the auditory nerve,” he says. “Right now, they don’t do that, and there are technical challenges to making that happen—but the modeling results really pretty clearly suggest that’s what you’ve got to do.”

Giving robots social skills

Press Mentions

Robots can deliver food on a college campus and hit a hole-in-one on the golf course, but even the most sophisticated robot can’t perform basic social interactions that are critical to everyday human life.

MIT researchers have now incorporated certain social interactions into a framework for robotics, enabling machines to understand what it means to help or hinder one another, and to learn to perform these social behaviors on their own. In a simulated environment, a robot watches its companion, guesses what task it wants to accomplish, and then helps or hinders this other robot based on its own goals.

The researchers also showed that their model creates realistic and predictable social interactions. When they showed videos of these simulated robots interacting with one another to humans, the human viewers mostly agreed with the model about what type of social behavior was occurring.

Enabling robots to exhibit social skills could lead to smoother and more positive human-robot interactions. For instance, a robot in an assisted living facility could use these capabilities to help create a more caring environment for elderly individuals. The new model may also enable scientists to measure social interactions quantitatively, which could help psychologists study autism or analyze the effects of antidepressants.

“Robots will live in our world soon enough, and they really need to learn how to communicate with us on human terms. They need to understand when it is time for them to help and when it is time for them to see what they can do to prevent something from happening. This is very early work and we are barely scratching the surface, but I feel like this is the first very serious attempt for understanding what it means for humans and machines to interact socially,” says Boris Katz, principal research scientist and head of the InfoLab Group in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and a member of the Center for Brains, Minds, and Machines (CBMM).

Joining Katz on the paper are co-lead author Ravi Tejwani, a research assistant at CSAIL; co-lead author Yen-Ling Kuo, a CSAIL PhD student; Tianmin Shu, a postdoc in the Department of Brain and Cognitive Sciences; and senior author Andrei Barbu, a research scientist at CSAIL and CBMM. The research will be presented at the Conference on Robot Learning in November.

A social simulation

To study social interactions, the researchers created a simulated environment where robots pursue physical and social goals as they move around a two-dimensional grid.

A physical goal relates to the environment. For example, a robot’s physical goal might be to navigate to a tree at a certain point on the grid. A social goal involves guessing what another robot is trying to do and then acting based on that estimation, like helping another robot water the tree.

The researchers use their model to specify what a robot’s physical goals are, what its social goals are, and how much emphasis it should place on one over the other. The robot is rewarded for actions it takes that get it closer to accomplishing its goals. If a robot is trying to help its companion, it adjusts its reward to match that of the other robot; if it is trying to hinder, it adjusts its reward to be the opposite. The planner, an algorithm that decides which actions the robot should take, uses this continually updating reward to guide the robot to carry out a blend of physical and social goals.

“We have opened a new mathematical framework for how you model social interaction between two agents. If you are a robot, and you want to go to location X, and I am another robot and I see that you are trying to go to location X, I can cooperate by helping you get to location X faster. That might mean moving X closer to you, finding another better X, or taking whatever action you had to take at X. Our formulation allows the plan to discover the ‘how’; we specify the ‘what’ in terms of what social interactions mean mathematically,” says Tejwani.

Blending a robot’s physical and social goals is important to create realistic interactions, since humans who help one another have limits to how far they will go. For instance, a rational person likely wouldn’t just hand a stranger their wallet, Barbu says.

The researchers used this mathematical framework to define three types of robots. A level 0 robot has only physical goals and cannot reason socially. A level 1 robot has physical and social goals but assumes all other robots only have physical goals. Level 1 robots can take actions based on the physical goals of other robots, like helping and hindering. A level 2 robot assumes other robots have social and physical goals; these robots can take more sophisticated actions like joining in to help together.

Evaluating the model

To see how their model compared to human perspectives about social interactions, they created 98 different scenarios with robots at levels 0, 1, and 2. Twelve humans watched 196 video clips of the robots interacting, and then were asked to estimate the physical and social goals of those robots.

In most instances, their model agreed with what the humans thought about the social interactions that were occurring in each frame.

“We have this long-term interest, both to build computational models for robots, but also to dig deeper into the human aspects of this. We want to find out what features from these videos humans are using to understand social interactions. Can we make an objective test for your ability to recognize social interactions? Maybe there is a way to teach people to recognize these social interactions and improve their abilities. We are a long way from this, but even just being able to measure social interactions effectively is a big step forward,” Barbu says.

Toward greater sophistication

The researchers are working on developing a system with 3D agents in an environment that allows many more types of interactions, such as the manipulation of household objects. They are also planning to modify their model to include environments where actions can fail.

The researchers also want to incorporate a neural network-based robot planner into the model, which learns from experience and performs faster. Finally, they hope to run an experiment to collect data about the features humans use to determine if two robots are engaging in a social interaction.

“Hopefully, we will have a benchmark that allows all researchers to work on these social interactions and inspire the kinds of science and engineering advances we’ve seen in other areas such as object and action recognition,” Barbu says.

“I think this is a lovely application of structured reasoning to a complex yet urgent challenge,” says Tomer Ullman, assistant professor in the Department of Psychology at Harvard University and head of the Computation, Cognition, and Development Lab, who was not involved with this research. “Even young infants seem to understand social interactions like helping and hindering, but we don’t yet have machines that can perform this reasoning at anything like human-level flexibility. I believe models like the ones proposed in this work, that have agents thinking about the rewards of others and socially planning how best to thwart or support them, are a good step in the right direction.”

This research was supported by the Center for Brains, Minds, and Machines; the National Science Foundation; the MIT CSAIL Systems that Learn Initiative; the MIT-IBM Watson AI Lab; the DARPA Artificial Social Intelligence for Successful Teams program; the U.S. Air Force Research Laboratory; the U.S. Air Force Artificial Intelligence Accelerator; and the Office of Naval Research.

Artificial intelligence sheds light on how the brain processes language

In the past few years, artificial intelligence models of language have become very good at certain tasks. Most notably, they excel at predicting the next word in a string of text; this technology helps search engines and texting apps predict the next word you are going to type.

The most recent generation of predictive language models also appears to learn something about the underlying meaning of language. These models can not only predict the word that comes next, but also perform tasks that seem to require some degree of genuine understanding, such as question answering, document summarization, and story completion.

Such models were designed to optimize performance for the specific function of predicting text, without attempting to mimic anything about how the human brain performs this task or understands language. But a new study from MIT neuroscientists suggests the underlying function of these models resembles the function of language-processing centers in the human brain.

Computer models that perform well on other types of language tasks do not show this similarity to the human brain, offering evidence that the human brain may use next-word prediction to drive language processing.

“The better the model is at predicting the next word, the more closely it fits the human brain,” says Nancy Kanwisher, the Walter A. Rosenblith Professor of Cognitive Neuroscience, a member of MIT’s McGovern Institute for Brain Research and Center for Brains, Minds, and Machines (CBMM), and an author of the new study. “It’s amazing that the models fit so well, and it very indirectly suggests that maybe what the human language system is doing is predicting what’s going to happen next.”

Joshua Tenenbaum, a professor of computational cognitive science at MIT and a member of CBMM and MIT’s Artificial Intelligence Laboratory (CSAIL); and Evelina Fedorenko, the Frederick A. and Carole J. Middleton Career Development Associate Professor of Neuroscience and a member of the McGovern Institute, are the senior authors of the study, which appears this week in the Proceedings of the National Academy of Sciences.

Martin Schrimpf, an MIT graduate student who works in CBMM, is the first author of the paper.

Making predictions

The new, high-performing next-word prediction models belong to a class of models called deep neural networks. These networks contain computational “nodes” that form connections of varying strength, and layers that pass information between each other in prescribed ways.

Over the past decade, scientists have used deep neural networks to create models of vision that can recognize objects as well as the primate brain does. Research at MIT has also shown that the underlying function of visual object recognition models matches the organization of the primate visual cortex, even though those computer models were not specifically designed to mimic the brain.

In the new study, the MIT team used a similar approach to compare language-processing centers in the human brain with language-processing models. The researchers analyzed 43 different language models, including several that are optimized for next-word prediction. These include a model called GPT-3 (Generative Pre-trained Transformer 3), which, given a prompt, can generate text similar to what a human would produce. Other models were designed to perform different language tasks, such as filling in a blank in a sentence.

As each model was presented with a string of words, the researchers measured the activity of the nodes that make up the network. They then compared these patterns to activity in the human brain, measured in subjects performing three language tasks: listening to stories, reading sentences one at a time, and reading sentences in which one word is revealed at a time. These human datasets included functional magnetic resonance (fMRI) data and intracranial electrocorticographic measurements taken in people undergoing brain surgery for epilepsy.

They found that the best-performing next-word prediction models had activity patterns that very closely resembled those seen in the human brain. Activity in those same models was also highly correlated with measures of human behavioral measures such as how fast people were able to read the text.

“We found that the models that predict the neural responses well also tend to best predict human behavior responses, in the form of reading times. And then both of these are explained by the model performance on next-word prediction. This triangle really connects everything together,” Schrimpf says.

“A key takeaway from this work is that language processing is a highly constrained problem: The best solutions to it that AI engineers have created end up being similar, as this paper shows, to the solutions found by the evolutionary process that created the human brain. Since the AI network didn’t seek to mimic the brain directly — but does end up looking brain-like — this suggests that, in a sense, a kind of convergent evolution has occurred between AI and nature,” says Daniel Yamins, an assistant professor of psychology and computer science at Stanford University, who was not involved in the study.

Game changer

One of the key computational features of predictive models such as GPT-3 is an element known as a forward one-way predictive transformer. This kind of transformer is able to make predictions of what is going to come next, based on previous sequences. A significant feature of this transformer is that it can make predictions based on a very long prior context (hundreds of words), not just the last few words.

Scientists have not found any brain circuits or learning mechanisms that correspond to this type of processing, Tenenbaum says. However, the new findings are consistent with hypotheses that have been previously proposed that prediction is one of the key functions in language processing, he says.

“One of the challenges of language processing is the real-time aspect of it,” he says. “Language comes in, and you have to keep up with it and be able to make sense of it in real time.”

The researchers now plan to build variants of these language processing models to see how small changes in their architecture affect their performance and their ability to fit human neural data.

“For me, this result has been a game changer,” Fedorenko says. “It’s totally transforming my research program, because I would not have predicted that in my lifetime we would get to these computationally explicit models that capture enough about the brain so that we can actually leverage them in understanding how the brain works.”

The researchers also plan to try to combine these high-performing language models with some computer models Tenenbaum’s lab has previously developed that can perform other kinds of tasks such as constructing perceptual representations of the physical world.

“If we’re able to understand what these language models do and how they can connect to models which do things that are more like perceiving and thinking, then that can give us more integrative models of how things work in the brain,” Tenenbaum says. “This could take us toward better artificial intelligence models, as well as giving us better models of how more of the brain works and how general intelligence emerges, than we’ve had in the past.”

The research was funded by a Takeda Fellowship; the MIT Shoemaker Fellowship; the Semiconductor Research Corporation; the MIT Media Lab Consortia; the MIT Singleton Fellowship; the MIT Presidential Graduate Fellowship; the Friends of the McGovern Institute Fellowship; the MIT Center for Brains, Minds, and Machines, through the National Science Foundation; the National Institutes of Health; MIT’s Department of Brain and Cognitive Sciences; and the McGovern Institute.

Other authors of the paper are Idan Blank PhD ’16 and graduate students Greta Tuckute, Carina Kauf, and Eghbal Hosseini.

Data transformed

With the tools of modern neuroscience, data accumulates quickly. Recording devices listen in on the electrical conversations between neurons, picking up the voices of hundreds of cells at a time. Microscopes zoom in to illuminate the brain’s circuitry, capturing thousands of images of cells’ elaborately branched paths. Functional MRIs detect changes in blood flow to map activity within a person’s brain, generating a complete picture by compiling hundreds of scans.

“When I entered neuroscience about 20 years ago, data were extremely precious, and ideas, as the expression went, were cheap. That’s no longer true,” says McGovern Associate Investigator Ila Fiete. “We have an embarrassment of wealth in the data but lack sufficient conceptual and mathematical scaffolds to understand it.”

Fiete will lead the McGovern Institute’s new K. Lisa Yang Integrative Computational Neuroscience (ICoN) Center, whose scientists will create mathematical models and other computational tools to confront the current deluge of data and advance our understanding of the brain and mental health. The center, funded by a $24 million donation from philanthropist Lisa Yang, will take a uniquely collaborative approach to computational neuroscience, integrating data from MIT labs to explain brain function at every level, from the molecular to the behavioral.

“Driven by technologies that generate massive amounts of data, we are entering a new era of translational neuroscience research,” says Yang, whose philanthropic investment in MIT research now exceeds $130 million. “I am confident that the multidisciplinary expertise convened by this center will revolutionize how we synthesize this data and ultimately understand the brain in health and disease.”

Data integration

Fiete says computation is particularly crucial to neuroscience because the brain is so staggeringly complex. Its billions of neurons, which are themselves complicated and diverse, interact with one other through trillions of connections.

“Conceptually, it’s clear that all these interactions are going to lead to pretty complex things. And these are not going to be things that we can explain in stories that we tell,” Fiete says. “We really will need mathematical models. They will allow us to ask about what changes when we perturb one or several components — greatly accelerating the rate of discovery relative to doing those experiments in real brains.”

By representing the interactions between the components of a neural circuit, a model gives researchers the power to explore those interactions, manipulate them, and predict the circuit’s behavior under different conditions.

“You can observe these neurons in the same way that you would observe real neurons. But you can do even more, because you have access to all the neurons and you have access to all the connections and everything in the network,” explains computational neuroscientist and McGovern Associate Investigator Guangyu Robert Yang (no relation to Lisa Yang), who joined MIT as a junior faculty member in July 2021.

Many neuroscience models represent specific functions or parts of the brain. But with advances in computation and machine learning, along with the widespread availability of experimental data with which to test and refine models, “there’s no reason that we should be limited to that,” he says.

Robert Yang’s team at the McGovern Institute is working to develop models that integrate multiple brain areas and functions. “The brain is not just about vision, just about cognition, just about motor control,” he says. “It’s about all of these things. And all these areas, they talk to one another.” Likewise, he notes, it’s impossible to separate the molecules in the brain from their effects on behavior – although those aspects of neuroscience have traditionally been studied independently, by researchers with vastly different expertise.

The ICoN Center will eliminate the divides, bringing together neuroscientists and software engineers to deal with all types of data about the brain. To foster interdisciplinary collaboration, every postdoctoral fellow and engineer at the center will work with multiple faculty mentors. Working in three closely interacting scientific cores, fellows will develop computational technologies for analyzing molecular data, neural circuits, and behavior, such as tools to identify pat-terns in neural recordings or automate the analysis of human behavior to aid psychiatric diagnoses. These technologies will also help researchers model neural circuits, ultimately transforming data into knowledge and understanding.

“Lisa is focused on helping the scientific community realize its goals in translational research,” says Nergis Mavalvala, dean of the School of Science and the Curtis and Kathleen Marble Professor of Astrophysics. “With her generous support, we can accelerate the pace of research by connecting the data to the delivery of tangible results.”

Computational modeling

In its first five years, the ICoN Center will prioritize four areas of investigation: episodic memory and exploration, including functions like navigation and spatial memory; complex or stereotypical behavior, such as the perseverative behaviors associated with autism and obsessive-compulsive disorder; cognition and attention; and sleep. The goal, Fiete says, is to model the neuronal interactions that underlie these functions so that researchers can predict what will happen when something changes — when certain neurons become more active or when a genetic mutation is introduced, for example. When paired with experimental data from MIT labs, the center’s models will help explain not just how these circuits work, but also how they are altered by genes, the environment, aging, and disease.

These focus areas encompass circuits and behaviors often affected by psychiatric disorders and neurodegeneration, and models will give researchers new opportunities to explore their origins and potential treatment strategies. “I really think that the future of treating disorders of the mind is going to run through computational modeling,” says McGovern Associate Investigator Josh McDermott.

In McDermott’s lab, researchers are modeling the brain’s auditory circuits. “If we had a perfect model of the auditory system, we would be able to understand why when somebody loses their hearing, auditory abilities degrade in the very particular ways in which they degrade,” he says. Then, he says, that model could be used to optimize hearing aids by predicting how the brain would interpret sound altered in various ways by the device.

Similar opportunities will arise as researchers model other brain systems, McDermott says, noting that computational models help researchers grapple with a dauntingly vast realm of possibilities. “There’s lots of different ways the brain can be set up, and lots of different potential treatments, but there is a limit to the number of neuroscience or behavioral experiments you can run,” he says. “Doing experiments on a computational system is cheap, so you can explore the dynamics of the system in a very thorough way.”

The ICoN Center will speed the development of the computational tools that neuroscientists need, both for basic understanding of the brain and clinical advances. But Fiete hopes for a culture shift within neuroscience, as well. “There are a lot of brilliant students and postdocs who have skills that are mathematics and computational and modeling based,” she says. “I think once they know that there are these possibilities to collaborate to solve problems related to psychiatric disorders and how we think, they will see that this is an exciting place to apply their skills, and we can bring them in.”