What powerful new bots like ChatGPT tell us about intelligence and the human brain

This story originally appeared in the Spring 2023 issue of BrainScan.

___

Artificial intelligence seems to have gotten a lot smarter recently. AI technologies are increasingly integrated into our lives — improving our weather forecasts, finding efficient routes through traffic, personalizing the ads we see and our experiences with social media.

Watercolor image of a robot with a human brain, created using the AI system DALL*E2.

But with the debut of powerful new chatbots like ChatGPT, millions of people have begun interacting with AI tools that seem convincingly human-like. Neuroscientists are taking note — and beginning to dig into what these tools tell us about intelligence and the human brain.

The essence of human intelligence is hard to pin down, let alone engineer. McGovern scientists say there are many kinds of intelligence, and as humans, we call on many different kinds of knowledge and ways of thinking. ChatGPT’s ability to carry on natural conversations with its users has led some to speculate the computer model is sentient, but McGovern neuroscientists insist that the AI technology cannot think for itself.

Still, they say, the field may have reached a turning point.

“I still don’t believe that we can make something that is indistinguishable from a human. I think we’re a long way from that. But for the first time in my life I think there is a small, nonzero chance that it may happen in the next year,” says McGovern founding member Tomaso Poggio, who has studied both human intelligence and machine learning for more than 40 years.

Different sort of intelligence

Developed by the company OpenAI, ChatGPT is an example of a deep neural network, a type of machine learning system that has made its way into virtually every aspect of science and technology. These models learn to perform various tasks by identifying patterns in large datasets. ChatGPT works by scouring texts and detecting and replicating the ways language is used. Drawing on language patterns it finds across the internet, ChatGPT can design you a meal plan, teach you about rocket science, or write a high school-level essay about Mark Twain. With all of the internet as a training tool, models like this have gotten so good at what they do, they can seem all-knowing.

“Engineers have been inventing some of these forms of intelligence since the beginning of the computers. ChatGPT is one. But it is very far from human intelligence.” – Tomaso Poggio

Nonetheless, language models have a restricted skill set. Play with ChatGPT long enough and it will surely give you some wrong information, even if its fluency makes its words deceptively convincing. “These models don’t know about the world, they don’t know about other people’s mental states, they don’t know how things are beyond whatever they can gather from how words go together,” says Postdoctoral Associate Anna Ivanova, who works with McGovern Investigators Evelina Fedorenko and Nancy Kanwisher as well as Jacob Andreas in MIT’s Computer Science and Artificial Intelligence Laboratory.

Such a model, the researchers say, cannot replicate the complex information processing that happens in the human brain. That doesn’t mean language models can’t be intelligent — but theirs is a different sort of intelligence than our own. “I think that there is an infinite number of different forms of intelligence,” says Poggio. “Engineers have been inventing some of these forms of intelligence since the beginning of the computers. ChatGPT is one. But it is very far from human intelligence.”

Under the hood

Just as there are many forms of intelligence, there are also many types of deep learning models — and McGovern researchers are studying the internals of these models to better understand the human brain.

A watercolor painting of a robot generated by DALL*E2.

“These AI models are, in a way, computational hypotheses for what the brain is doing,” Kanwisher says. “Up until a few years ago, we didn’t really have complete computational models of what might be going on in language processing or vision. Once you have a way of generating actual precise models and testing them against real data, you’re kind of off and running in a way that we weren’t ten years ago.”

Artificial neural networks echo the design of the brain in that they are made of densely interconnected networks of simple units that organize themselves — but Poggio says it’s not yet entirely clear how they work.

No one expects that brains and machines will work in exactly the same ways, though some types of deep learning models are more humanlike in their internals than others. For example, a computer vision model developed by McGovern Investigator James DiCarlo responds to images in ways that closely parallel the activity in the visual cortex of animals who are seeing the same thing. DiCarlo’s team can even use their model’s predictions to create an image that will activate specific neurons in an animal’s brain.

“We shouldn’t just automatically assume that if we trained a deep network on a task, that it’s going to look like the brain.” – Ila Fiete

Still, there is reason to be cautious in interpreting what artificial neural networks tell us about biology. “We shouldn’t just automatically assume that if we trained a deep network on a task, that it’s going to look like the brain,” says McGovern Associate Investigator Ila Fiete. Fiete acknowledges that it’s tempting to think of neural networks as models of the brain itself due to their architectural similarities — but she says so far, that idea remains largely untested.

McGovern Institute Associate Investigator Ila Fiete builds theoretical models of the brain. Photo: Caitlin Cunningham

She and her colleagues recently experimented with neural networks that estimate an object’s position in space by integrating information about its changing velocity.

In the brain, specialized neurons known as grid cells carry out this calculation, keeping us aware of where we are as we move through the world. Other researchers had reported that not only can neural networks do this successfully, those that do include components that behave remarkably like grid cells. They had argued that the need to do this kind of path integration must be the reason our brains have grid cells — but Fiete’s team found that artificial networks don’t need to mimic the brain to accomplish this brain-like task. They found that many neural networks can solve the same problem without grid cell-like elements.

One way investigators might generate deep learning models that do work like the brain is to give them a problem that is so complex that there is only one way of solving it, Fiete says.

Language, she acknowledges, might be that complex.

“This is clearly an example of a super-rich task,” she says. “I think on that front, there is a hope that they’re solving such an incredibly difficult task that maybe there is a sense in which they mirror the brain.”

Language parallels

In Fedorenko’s lab, where researchers are focused on identifying and understanding the brain’s language processing circuitry, they have found that some language models do, in fact, mimic certain aspects of human language processing. Many of the most effective models are trained to do a single task: make predictions about word use. That’s what your phone is doing when it suggests words for your text message as you type. Models that are good at this, it turns out, can apply this skill to carrying on conversations, composing essays, and using language in other useful ways. Neuroscientists have found evidence that humans, too, rely on word prediction as a part of language processing.

Fedorenko and her team compared the activity of language models to the brain activity of people as they read or listened to words, sentences, and stories, and found that some models were a better match to human neural responses than others. “The models that do better on this relatively unsophisticated task — just guess what comes next — also do better at capturing human neural responses,” Fedorenko says.

A watercolor painting of a language model, generated by DALL*E2.

It’s a compelling parallel, suggesting computational models and the human brain may have arrived at a similar solution to a problem, even in the face of the biological constraints that have shaped the latter. For Fedorenko and her team, it’s sparked new ideas that they will explore, in part, by modifying existing language models — possibly to more closely mimic the brain.

With so much still unknown about how both human and artificial neural networks learn, Fedorenko says it’s hard to predict what it will take to make language models work and behave more like the human brain. One possibility they are exploring is training a model in a way that more closely mirrors the way children learn language early in life.

Another question, she says, is whether language models might behave more like humans if they had a more limited recall of their own conversations. “All of the state-of-the-art language models keep track of really, really long linguistic contexts. Humans don’t do that,” she says.

Chatbots can retain long strings of dialogue, using those words to tailor their responses as a conversation progresses, she explains. Humans, on the other hand, must cope with a more limited memory. While we can keep track of information as it is conveyed, we only store a string of about eight words as we listen or read. “We get linguistic input, we crunch it up, we extract some kind of meaning representation, presumably in some more abstract format, and then we discard the exact linguistic stream because we don’t need it anymore,” Fedorenko explains.

Language models aren’t able to fill in gaps in conversation with their own knowledge and awareness in the same way a person can, Ivanova adds. “That’s why so far they have to keep track of every single input word,” she says. “If we want a model that models specifically the [human] language network, we don’t need to have this large context window. It would be very cool to train those models on those short windows of context and see if it’s more similar to the language network.”

Multimodal intelligence

Despite these parallels, Fedorenko’s lab has also shown that there are plenty of things language circuits do not do. The brain calls on other circuits to solve math problems, write computer code, and carry out myriad other cognitive processes. Their work makes it clear that in the brain, language and thought are not the same.

That’s borne out by what cognitive neuroscientists like Kanwisher have learned about the functional organization of the human brain, where circuit components are dedicated to surprisingly specific tasks, from language processing to face recognition.

“The upshot of cognitive neuroscience over the last 25 years is that the human brain really has quite a degree of modular organization,” Kanwisher says. “You can look at the brain and say, ‘what does it tell us about the nature of intelligence?’ Well, intelligence is made up of a whole bunch of things.”

In generating this image from the text prompt, “a watercolor painting of a woman looking in a mirror and seeing a robot,” DALL*E2 incorrectly placed the woman (not the robot) in the mirror, highlighting one of the weaknesses of current deep learning models.

In January, Fedorenko, Kanwisher, Ivanova, and colleagues shared an extensive analysis of the capabilities of large language models. After assessing models’ performance on various language-related tasks, they found that despite their mastery of linguistic rules and patterns, such models don’t do a good job using language in real-world situations. From a neuroscience perspective, that kind of functional competence is distinct from formal language competence, calling on not just language-processing circuits but also parts of the brain that store knowledge of the world, reason, and interpret social interactions.

Language is a powerful tool for understanding the world, they say, but it has limits.

“If you train on language prediction alone, you can learn to mimic certain aspects of thinking,” Ivanova says. “But it’s not enough. You need a multimodal system to carry out truly intelligent behavior.”

The team concluded that while AI language models do a very good job using language, they are incomplete models of human thought. For machines to truly think like humans, Ivanova says, they will need a combination of different neural nets all working together, in the same way different networks in the human brain work together to achieve complex cognitive tasks in the real world.

It remains to be seen whether such models would excel in the tech world, but they could prove valuable for revealing insights into human cognition — perhaps in ways that will inform engineers as they strive to build systems that better replicate human intelligence.

New insights into training dynamics of deep classifiers

A new study from researchers at MIT and Brown University characterizes several properties that emerge during the training of deep classifiers, a type of artificial neural network commonly used for classification tasks such as image classification, speech recognition, and natural language processing.

The paper, “Dynamics in Deep Classifiers trained with the Square Loss: Normalization, Low Rank, Neural Collapse and Generalization Bounds,” published today in the journal Research, is the first of its kind to theoretically explore the dynamics of training deep classifiers with the square loss and how properties such as rank minimization, neural collapse, and dualities between the activation of neurons and the weights of the layers are intertwined.

In the study, the authors focused on two types of deep classifiers: fully connected deep networks and convolutional neural networks (CNNs).

A previous study examined the structural properties that develop in large neural networks at the final stages of training. That study focused on the last layer of the network and found that deep networks trained to fit a training dataset will eventually reach a state known as “neural collapse.” When neural collapse occurs, the network maps multiple examples of a particular class (such as images of cats) to a single template of that class. Ideally, the templates for each class should be as far apart from each other as possible, allowing the network to accurately classify new examples.

An MIT group based at the MIT Center for Brains, Minds and Machines studied the conditions under which networks can achieve neural collapse. Deep networks that have the three ingredients of stochastic gradient descent (SGD), weight decay regularization (WD), and weight normalization (WN) will display neural collapse if they are trained to fit their training data. The MIT group has taken a theoretical approach — as compared to the empirical approach of the earlier study — proving that neural collapse emerges from the minimization of the square loss using SGD, WD, and WN.

Co-author and MIT McGovern Institute postdoc Akshay Rangamani states, “Our analysis shows that neural collapse emerges from the minimization of the square loss with highly expressive deep neural networks. It also highlights the key roles played by weight decay regularization and stochastic gradient descent in driving solutions towards neural collapse.”

Weight decay is a regularization technique that prevents the network from over-fitting the training data by reducing the magnitude of the weights. Weight normalization scales the weight matrices of a network so that they have a similar scale. Low rank refers to a property of a matrix where it has a small number of non-zero singular values. Generalization bounds offer guarantees about the ability of a network to accurately predict new examples that it has not seen during training.

The authors found that the same theoretical observation that predicts a low-rank bias also predicts the existence of an intrinsic SGD noise in the weight matrices and in the output of the network. This noise is not generated by the randomness of the SGD algorithm but by an interesting dynamic trade-off between rank minimization and fitting of the data, which provides an intrinsic source of noise similar to what happens in dynamic systems in the chaotic regime. Such a random-like search may be beneficial for generalization because it may prevent over-fitting.

“Interestingly, this result validates the classical theory of generalization showing that traditional bounds are meaningful. It also provides a theoretical explanation for the superior performance in many tasks of sparse networks, such as CNNs, with respect to dense networks,” comments co-author and MIT McGovern Institute postdoc Tomer Galanti. In fact, the authors prove new norm-based generalization bounds for CNNs with localized kernels, that is a network with sparse connectivity in their weight matrices.

In this case, generalization can be orders of magnitude better than densely connected networks. This result validates the classical theory of generalization, showing that its bounds are meaningful, and goes against a number of recent papers expressing doubts about past approaches to generalization. It also provides a theoretical explanation for the superior performance of sparse networks, such as CNNs, with respect to dense networks. Thus far, the fact that CNNs and not dense networks represent the success story of deep networks has been almost completely ignored by machine learning theory. Instead, the theory presented here suggests that this is an important insight in why deep networks work as well as they do.

“This study provides one of the first theoretical analyses covering optimization, generalization, and approximation in deep networks and offers new insights into the properties that emerge during training,” says co-author Tomaso Poggio, the Eugene McDermott Professor at the Department of Brain and Cognitive Sciences at MIT and co-director of the Center for Brains, Minds and Machines. “Our results have the potential to advance our understanding of why deep learning works as well as it does.”

Looking into the black box of deep learning networks

Deep learning systems are revolutionizing technology around us, from voice recognition that pairs you with your phone to autonomous vehicles that are increasingly able to see and recognize obstacles ahead. But much of this success involves trial and error when it comes to the deep learning networks themselves. A group of MIT researchers recently reviewed their contributions to a better theoretical understanding of deep learning networks, providing direction for the field moving forward.

“Deep learning was in some ways an accidental discovery,” explains Tomaso Poggio, investigator at the McGovern Institute for Brain Research, director of the Center for Brains, Minds, and Machines (CBMM), and the Eugene McDermott Professor in Brain and Cognitive Sciences. “We still do not understand why it works. A theoretical framework is taking form, and I believe that we are now close to a satisfactory theory. It is time to stand back and review recent insights.”

Climbing data mountains

Our current era is marked by a superabundance of data — data from inexpensive sensors of all types, text, the internet, and large amounts of genomic data being generated in the life sciences. Computers nowadays ingest these multidimensional datasets, creating a set of problems dubbed the “curse of dimensionality” by the late mathematician Richard Bellman.

One of these problems is that representing a smooth, high-dimensional function requires an astronomically large number of parameters. We know that deep neural networks are particularly good at learning how to represent, or approximate, such complex data, but why? Understanding why could potentially help advance deep learning applications.

“Deep learning is like electricity after Volta discovered the battery, but before Maxwell,” explains Poggio.

“Useful applications were certainly possible after Volta, but it was Maxwell’s theory of electromagnetism, this deeper understanding that then opened the way to the radio, the TV, the radar, the transistor, the computers, and the internet,” says Poggio, who is the founding scientific advisor of The Core, MIT Quest for Intelligence, and an investigator in the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT.

The theoretical treatment by Poggio, Andrzej Banburski, and Qianli Liao points to why deep learning might overcome data problems such as “the curse of dimensionality.” Their approach starts with the observation that many natural structures are hierarchical. To model the growth and development of a tree doesn’t require that we specify the location of every twig. Instead, a model can use local rules to drive branching hierarchically. The primate visual system appears to do something similar when processing complex data. When we look at natural images — including trees, cats, and faces — the brain successively integrates local image patches, then small collections of patches, and then collections of collections of patches.

“The physical world is compositional — in other words, composed of many local physical interactions,” explains Qianli Liao, an author of the study, and a graduate student in the Department of Electrical Engineering and Computer Science and a member of the CBMM. “This goes beyond images. Language and our thoughts are compositional, and even our nervous system is compositional in terms of how neurons connect with each other. Our review explains theoretically why deep networks are so good at representing this complexity.”

The intuition is that a hierarchical neural network should be better at approximating a compositional function than a single “layer” of neurons, even if the total number of neurons is the same. The technical part of their work identifies what “better at approximating” means and proves that the intuition is correct.

Generalization puzzle

There is a second puzzle about what is sometimes called the unreasonable effectiveness of deep networks. Deep network models often have far more parameters than data to fit them, despite the mountains of data we produce these days. This situation ought to lead to what is called “overfitting,” where your current data fit the model well, but any new data fit the model terribly. This is dubbed poor generalization in conventional models. The conventional solution is to constrain some aspect of the fitting procedure. However, deep networks do not seem to require this constraint. Poggio and his colleagues prove that, in many cases, the process of training a deep network implicitly “regularizes” the solution, providing constraints.

The work has a number of implications going forward. Though deep learning is actively being applied in the world, this has so far occurred without a comprehensive underlying theory. A theory of deep learning that explains why and how deep networks work, and what their limitations are, will likely allow development of even much more powerful learning approaches.

“In the long term, the ability to develop and build better intelligent machines will be essential to any technology-based economy,” explains Poggio. “After all, even in its current — still highly imperfect — state, deep learning is impacting, or about to impact, just about every aspect of our society and life.”

Peering under the hood of fake-news detectors

New work from researchers at the McGovern Institute for Brain Research at MIT peers under the hood of an automated fake-news detection system, revealing how machine-learning models catch subtle but consistent differences in the language of factual and false stories. The research also underscores how fake-news detectors should undergo more rigorous testing to be effective for real-world applications.

Popularized as a concept in the United States during the 2016 presidential election, fake news is a form of propaganda created to mislead readers, in order to generate views on websites or steer public opinion.

Almost as quickly as the issue became mainstream, researchers began developing automated fake news detectors — so-called neural networks that “learn” from scores of data to recognize linguistic cues indicative of false articles. Given new articles to assess, these networks can, with fairly high accuracy, separate fact from fiction, in controlled settings.

One issue, however, is the “black box” problem — meaning there’s no telling what linguistic patterns the networks analyze during training. They’re also trained and tested on the same topics, which may limit their potential to generalize to new topics, a necessity for analyzing news across the internet.

In a paper presented at the Conference and Workshop on Neural Information Processing Systems, the researchers tackle both of those issues. They developed a deep-learning model that learns to detect language patterns of fake and real news. Part of their work “cracks open” the black box to find the words and phrases the model captures to make its predictions.

Additionally, they tested their model on a novel topic it didn’t see in training. This approach classifies individual articles based solely on language patterns, which more closely represents a real-world application for news readers. Traditional fake news detectors classify articles based on text combined with source information, such as a Wikipedia page or website.

“In our case, we wanted to understand what was the decision-process of the classifier based only on language, as this can provide insights on what is the language of fake news,” says co-author Xavier Boix, a postdoc in the lab of Eugene McDermott Professor Tomaso Poggio at the Center for Brains, Minds, and Machines (CBMM), a National Science Foundation-funded center housed within the McGovern Institute.

“A key issue with machine learning and artificial intelligence is that you get an answer and don’t know why you got that answer,” says graduate student and first author Nicole O’Brien ’17. “Showing these inner workings takes a first step toward understanding the reliability of deep-learning fake-news detectors.”

The model identifies sets of words that tend to appear more frequently in either real or fake news — some perhaps obvious, others much less so. The findings, the researchers say, points to subtle yet consistent differences in fake news — which favors exaggerations and superlatives — and real news, which leans more toward conservative word choices.

“Fake news is a threat for democracy,” Boix says. “In our lab, our objective isn’t just to push science forward, but also to use technologies to help society. … It would be powerful to have tools for users or companies that could provide an assessment of whether news is fake or not.”

The paper’s other co-authors are Sophia Latessa, an undergraduate student in CBMM; and Georgios Evangelopoulos, a researcher in CBMM, the McGovern Institute of Brain Research, and the Laboratory for Computational and Statistical Learning.

Limiting bias

The researchers’ model is a convolutional neural network that trains on a dataset of fake news and real news. For training and testing, the researchers used a popular fake news research dataset, called Kaggle, which contains around 12,000 fake news sample articles from 244 different websites. They also compiled a dataset of real news samples, using more than 2,000 from the New York Times and more than 9,000 from The Guardian.

In training, the model captures the language of an article as “word embeddings,” where words are represented as vectors — basically, arrays of numbers — with words of similar semantic meanings clustered closer together. In doing so, it captures triplets of words as patterns that provide some context — such as, say, a negative comment about a political party. Given a new article, the model scans the text for similar patterns and sends them over a series of layers. A final output layer determines the probability of each pattern: real or fake.

The researchers first trained and tested the model in the traditional way, using the same topics. But they thought this might create an inherent bias in the model, since certain topics are more often the subject of fake or real news. For example, fake news stories are generally more likely to include the words “Trump” and “Clinton.”

“But that’s not what we wanted,” O’Brien says. “That just shows topics that are strongly weighting in fake and real news. … We wanted to find the actual patterns in language that are indicative of those.”

Next, the researchers trained the model on all topics without any mention of the word “Trump,” and tested the model only on samples that had been set aside from the training data and that did contain the word “Trump.” While the traditional approach reached 93-percent accuracy, the second approach reached 87-percent accuracy. This accuracy gap, the researchers say, highlights the importance of using topics held out from the training process, to ensure the model can generalize what it has learned to new topics.

More research needed

To open the black box, the researchers then retraced their steps. Each time the model makes a prediction about a word triplet, a certain part of the model activates, depending on if the triplet is more likely from a real or fake news story. The researchers designed a method to retrace each prediction back to its designated part and then find the exact words that made it activate.

More research is needed to determine how useful this information is to readers, Boix says. In the future, the model could potentially be combined with, say, automated fact-checkers and other tools to give readers an edge in combating misinformation. After some refining, the model could also be the basis of a browser extension or app that alerts readers to potential fake news language.

“If I just give you an article, and highlight those patterns in the article as you’re reading, you could assess if the article is more or less fake,” he says. “It would be kind of like a warning to say, ‘Hey, maybe there is something strange here.’”

Tomaso Poggio

Engineering Intelligence

Tomaso Poggio is one of the founders of computational neuroscience. He pioneered a model of the fly’s visual system as well as of human stereovision. His research has always been interdisciplinary, bridging brains and computers. It is now focused on the mathematics of deep learning and on the computational neuroscience of the visual cortex. Poggio also introduced using an approach called regularization theory to computational vision, made key contributions to the biophysics of computation and to learning theory, and developed an influential model of recognition in the visual cortex. Research in the Poggio lab is guided by the belief that understanding learning is at the heart of understanding both biological and artificial intelligence. Learning is therefore the route to understanding how the human brain works and for making intelligent machines.

Machines that learn language more like kids do

Children learn language by observing their environment, listening to the people around them, and connecting the dots between what they see and hear. Among other things, this helps children establish their language’s word order, such as where subjects and verbs fall in a sentence.

In computing, learning language is the task of syntactic and semantic parsers. These systems are trained on sentences annotated by humans that describe the structure and meaning behind words. Parsers are becoming increasingly important for web searches, natural-language database querying, and voice-recognition systems such as Alexa and Siri. Soon, they may also be used for home robotics.

But gathering the annotation data can be time-consuming and difficult for less common languages. Additionally, humans don’t always agree on the annotations, and the annotations themselves may not accurately reflect how people naturally speak.

In a paper being presented at this week’s Empirical Methods in Natural Language Processing conference, MIT researchers describe a parser that learns through observation to more closely mimic a child’s language-acquisition process, which could greatly extend the parser’s capabilities. To learn the structure of language, the parser observes captioned videos, with no other information, and associates the words with recorded objects and actions. Given a new sentence, the parser can then use what it’s learned about the structure of the language to accurately predict a sentence’s meaning, without the video.

This “weakly supervised” approach — meaning it requires limited training data — mimics how children can observe the world around them and learn language, without anyone providing direct context. The approach could expand the types of data and reduce the effort needed for training parsers, according to the researchers. A few directly annotated sentences, for instance, could be combined with many captioned videos, which are easier to come by, to improve performance.

In the future, the parser could be used to improve natural interaction between humans and personal robots. A robot equipped with the parser, for instance, could constantly observe its environment to reinforce its understanding of spoken commands, including when the spoken sentences aren’t fully grammatical or clear. “People talk to each other in partial sentences, run-on thoughts, and jumbled language. You want a robot in your home that will adapt to their particular way of speaking … and still figure out what they mean,” says co-author Andrei Barbu, a researcher in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Center for Brains, Minds, and Machines (CBMM) within MIT’s McGovern Institute.

The parser could also help researchers better understand how young children learn language. “A child has access to redundant, complementary information from different modalities, including hearing parents and siblings talk about the world, as well as tactile information and visual information, [which help him or her] to understand the world,” says co-author Boris Katz, a principal research scientist and head of the InfoLab Group at CSAIL. “It’s an amazing puzzle, to process all this simultaneous sensory input. This work is part of bigger piece to understand how this kind of learning happens in the world.”

Co-authors on the paper are: first author Candace Ross, a graduate student in the Department of Electrical Engineering and Computer Science and CSAIL, and a researcher in CBMM; Yevgeni Berzak PhD ’17, a postdoc in the Computational Psycholinguistics Group in the Department of Brain and Cognitive Sciences; and CSAIL graduate student Battushig Myanganbayar.

Visual learner

For their work, the researchers combined a semantic parser with a computer-vision component trained in object, human, and activity recognition in video. Semantic parsers are generally trained on sentences annotated with code that ascribes meaning to each word and the relationships between the words. Some have been trained on still images or computer simulations.

The new parser is the first to be trained using video, Ross says. In part, videos are more useful in reducing ambiguity. If the parser is unsure about, say, an action or object in a sentence, it can reference the video to clear things up. “There are temporal components — objects interacting with each other and with people — and high-level properties you wouldn’t see in a still image or just in language,” Ross says.

The researchers compiled a dataset of about 400 videos depicting people carrying out a number of actions, including picking up an object or putting it down, and walking toward an object. Participants on the crowdsourcing platform Mechanical Turk then provided 1,200 captions for those videos. They set aside 840 video-caption examples for training and tuning, and used 360 for testing. One advantage of using vision-based parsing is “you don’t need nearly as much data — although if you had [the data], you could scale up to huge datasets,” Barbu says.

In training, the researchers gave the parser the objective of determining whether a sentence accurately describes a given video. They fed the parser a video and matching caption. The parser extracts possible meanings of the caption as logical mathematical expressions. The sentence, “The woman is picking up an apple,” for instance, may be expressed as: λxy. woman x, pick_up x y, apple y.

Those expressions and the video are inputted to the computer-vision algorithm, called “Sentence Tracker,” developed by Barbu and other researchers. The algorithm looks at each video frame to track how objects and people transform over time, to determine if actions are playing out as described. In this way, it determines if the meaning is possibly true of the video.

Connecting the dots

The expression with the most closely matching representations for objects, humans, and actions becomes the most likely meaning of the caption. The expression, initially, may refer to many different objects and actions in the video, but the set of possible meanings serves as a training signal that helps the parser continuously winnow down possibilities. “By assuming that all of the sentences must follow the same rules, that they all come from the same language, and seeing many captioned videos, you can narrow down the meanings further,” Barbu says.

In short, the parser learns through passive observation: To determine if a caption is true of a video, the parser by necessity must identify the highest probability meaning of the caption. “The only way to figure out if the sentence is true of a video [is] to go through this intermediate step of, ‘What does the sentence mean?’ Otherwise, you have no idea how to connect the two,” Barbu explains. “We don’t give the system the meaning for the sentence. We say, ‘There’s a sentence and a video. The sentence has to be true of the video. Figure out some intermediate representation that makes it true of the video.’”

The training produces a syntactic and semantic grammar for the words it’s learned. Given a new sentence, the parser no longer requires videos, but leverages its grammar and lexicon to determine sentence structure and meaning.

Ultimately, this process is learning “as if you’re a kid,” Barbu says. “You see world around you and hear people speaking to learn meaning. One day, I can give you a sentence and ask what it means and, even without a visual, you know the meaning.”

“This research is exactly the right direction for natural language processing,” says Stefanie Tellex, a professor of computer science at Brown University who focuses on helping robots use natural language to communicate with humans. “To interpret grounded language, we need semantic representations, but it is not practicable to make it available at training time. Instead, this work captures representations of compositional structure using context from captioned videos. This is the paper I have been waiting for!”

In future work, the researchers are interested in modeling interactions, not just passive observations. “Children interact with the environment as they’re learning. Our idea is to have a model that would also use perception to learn,” Ross says.

This work was supported, in part, by the CBMM, the National Science Foundation, a Ford Foundation Graduate Research Fellowship, the Toyota Research Institute, and the MIT-IBM Brain-Inspired Multimedia Comprehension project.

Fujitsu Laboratories and MIT’s Center for Brains, Minds and Machines broaden partnership

Fujitsu Laboratories Ltd. and MIT’s Center for Brains, Minds and Machines (CBMM) has announced a multi-year philanthropic partnership focused on advancing the science and engineering of intelligence while supporting the next generation of researchers in this emerging field. The new commitment follows on several years of collaborative research among scientists at the two organizations.

Founded in 1968, Fujitsu Laboratories has conducted a wide range of basic and applied research in the areas of next-generation services, computer servers, networks, electronic devices, and advanced materials. CBMM, a multi-institutional, National Science Foundation funded science and technology center focusing on the interdisciplinary study of intelligence, was established in 2013 and is headquartered at MIT’s McGovern Institute for Brain Research. CBMM is also the foundation of “The Core” of the MIT Quest for Intelligence launched earlier this year. The partnership between the two organizations started in March 2017 when Fujitsu Laboratories sent a visiting scientist to CBMM.

“A fundamental understanding of how humans think, feel, and make decisions is critical to developing revolutionary technologies that will have a real impact on societal problems,” said Shigeru Sasaki, CEO of Fujitsu Laboratories. “The partnership between MIT’s Center for Brains, Minds and Machines and Fujitsu Laboratories will help advance critical R&D efforts in both human intelligence and the creation of next-generation technologies that will shape our lives,” he added.

The new Fujitsu Laboratories Co-Creation Research Fund, established with a philanthropic gift from Fujitsu Laboratories, will fuel new, innovative and challenging projects in areas of interest to both Fujitsu and CBMM, including the basic study of computations underlying visual recognition and language processing, creation of new machine learning methods, and development of the theory of deep learning. Alongside funding for research projects, Fujitsu Laboratories will also fund fellowships for graduate students attending CBMM’s summer course from 2019 to contribute to the future of research and society on a long term basis. The intensive three-week course gives advanced students from universities worldwide a “deep end” introduction to the problem of intelligence. These students will later have the opportunity to travel to Fujitsu Laboratories in Japan or its overseas locations in the U.S., Canada, U.K., Spain, and China to meet with Fujitsu researchers.

“CBMM faculty, students, and fellows are excited for the opportunity to work alongside scientists from Fujitsu to make advances in complex problems of intelligence, both real and artificial,” said CBMM’s director Tomaso Poggio, who is also an investigator at the McGovern Institute and the Eugene McDermott Professor in MIT’s Department of Brain and Cognitive Sciences. “Both Fujitsu Laboratories and MIT are committed to creating revolutionary tools and systems that will transform many industries, and to do that we are first looking to the extraordinary computations made by the human mind in everyday life.”

As part of the partnership, Poggio will be a featured keynote speaker at the Fujitsu Laboratories Advanced Technology Symposium on Oct. 9. In addition, Tomotake Sasaki, a former visiting scientist and current research affiliate in the Poggio Lab, will continue to collaborate with CBMM scientists and engineers on reinforcement learning and deep learning research projects. Moyuru Yamada, a visiting scientist in the Lab of Professor Josh Tenenbaum, is also studying the computational model of human cognition and exploring its industrial applications. Moreover, Fujitsu Laboratories is planning to invite CBMM researchers to Japan or overseas offices and arrange internships for interested students.

Model helps robots navigate more like humans do

When moving through a crowd to reach some end goal, humans can usually navigate the space safely without thinking too much. They can learn from the behavior of others and note any obstacles to avoid. Robots, on the other hand, struggle with such navigational concepts.

MIT researchers have now devised a way to help robots navigate environments more like humans do. Their novel motion-planning model lets robots determine how to reach a goal by exploring the environment, observing other agents, and exploiting what they’ve learned before in similar situations. A paper describing the model was presented at this week’s IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

Popular motion-planning algorithms will create a tree of possible decisions that branches out until it finds good paths for navigation. A robot that needs to navigate a room to reach a door, for instance, will create a step-by-step search tree of possible movements and then execute the best path to the door, considering various constraints. One drawback, however, is these algorithms rarely learn: Robots can’t leverage information about how they or other agents acted previously in similar environments.

“Just like when playing chess, these decisions branch out until [the robots] find a good way to navigate. But unlike chess players, [the robots] explore what the future looks like without learning much about their environment and other agents,” says co-author Andrei Barbu, a researcher at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Center for Brains, Minds, and Machines (CBMM) within MIT’s McGovern Institute. “The thousandth time they go through the same crowd is as complicated as the first time. They’re always exploring, rarely observing, and never using what’s happened in the past.”

The researchers developed a model that combines a planning algorithm with a neural network that learns to recognize paths that could lead to the best outcome, and uses that knowledge to guide the robot’s movement in an environment.

In their paper, “Deep sequential models for sampling-based planning,” the researchers demonstrate the advantages of their model in two settings: navigating through challenging rooms with traps and narrow passages, and navigating areas while avoiding collisions with other agents. A promising real-world application is helping autonomous cars navigate intersections, where they have to quickly evaluate what others will do before merging into traffic. The researchers are currently pursuing such applications through the Toyota-CSAIL Joint Research Center.

“When humans interact with the world, we see an object we’ve interacted with before, or are in some location we’ve been to before, so we know how we’re going to act,” says Yen-Ling Kuo, a PhD student in CSAIL and first author on the paper. “The idea behind this work is to add to the search space a machine-learning model that knows from past experience how to make planning more efficient.”

Boris Katz, a principal research scientist and head of the InfoLab Group at CSAIL, is also a co-author on the paper.

Trading off exploration and exploitation

Traditional motion planners explore an environment by rapidly expanding a tree of decisions that eventually blankets an entire space. The robot then looks at the tree to find a way to reach the goal, such as a door. The researchers’ model, however, offers “a tradeoff between exploring the world and exploiting past knowledge,” Kuo says.

The learning process starts with a few examples. A robot using the model is trained on a few ways to navigate similar environments. The neural network learns what makes these examples succeed by interpreting the environment around the robot, such as the shape of the walls, the actions of other agents, and features of the goals. In short, the model “learns that when you’re stuck in an environment, and you see a doorway, it’s probably a good idea to go through the door to get out,” Barbu says.

The model combines the exploration behavior from earlier methods with this learned information. The underlying planner, called RRT*, was developed by MIT professors Sertac Karaman and Emilio Frazzoli. (It’s a variant of a widely used motion-planning algorithm known as Rapidly-exploring Random Trees, or  RRT.) The planner creates a search tree while the neural network mirrors each step and makes probabilistic predictions about where the robot should go next. When the network makes a prediction with high confidence, based on learned information, it guides the robot on a new path. If the network doesn’t have high confidence, it lets the robot explore the environment instead, like a traditional planner.

For example, the researchers demonstrated the model in a simulation known as a “bug trap,” where a 2-D robot must escape from an inner chamber through a central narrow channel and reach a location in a surrounding larger room. Blind allies on either side of the channel can get robots stuck. In this simulation, the robot was trained on a few examples of how to escape different bug traps. When faced with a new trap, it recognizes features of the trap, escapes, and continues to search for its goal in the larger room. The neural network helps the robot find the exit to the trap, identify the dead ends, and gives the robot a sense of its surroundings so it can quickly find the goal.

Results in the paper are based on the chances that a path is found after some time, total length of the path that reached a given goal, and how consistent the paths were. In both simulations, the researchers’ model more quickly plotted far shorter and consistent paths than a traditional planner.

“This model is interesting because it allows a motion planner to adapt to what it sees in the environment,” says Stephanie Tellex, an assistant professor of computer science at Brown University, who was not involved in the research. “This can enable dramatic improvements in planning speed by customizing the planner to what the robot knows. Most planners don’t adapt to the environment at all. Being able to traverse long, narrow passages is notoriously difficult for a conventional planner, but they can solve it. We need more ways that bridge this gap.”

Working with multiple agents

In one other experiment, the researchers trained and tested the model in navigating environments with multiple moving agents, which is a useful test for autonomous cars, especially navigating intersections and roundabouts. In the simulation, several agents are circling an obstacle. A robot agent must successfully navigate around the other agents, avoid collisions, and reach a goal location, such as an exit on a roundabout.

“Situations like roundabouts are hard, because they require reasoning about how others will respond to your actions, how you will then respond to theirs, what they will do next, and so on,” Barbu says. “You eventually discover your first action was wrong, because later on it will lead to a likely accident. This problem gets exponentially worse the more cars you have to contend with.”

Results indicate that the researchers’ model can capture enough information about the future behavior of the other agents (cars) to cut off the process early, while still making good decisions in navigation. This makes planning more efficient. Moreover, they only needed to train the model on a few examples of roundabouts with only a few cars. “The plans the robots make take into account what the other cars are going to do, as any human would,” Barbu says.

Going through intersections or roundabouts is one of the most challenging scenarios facing autonomous cars. This work might one day let cars learn how humans behave and how to adapt to drivers in different environments, according to the researchers. This is the focus of the Toyota-CSAIL Joint Research Center work.

“Not everybody behaves the same way, but people are very stereotypical. There are people who are shy, people who are aggressive. The model recognizes that quickly and that’s why it can plan efficiently,” Barbu says.

More recently, the researchers have been applying this work to robots with manipulators that face similarly daunting challenges when reaching for objects in ever-changing environments.

Engineering intelligence

Go is an ancient board game that demands not only strategy and logic, but intuition, creativity, and subtlety—in other words, it’s a game of quintessentially human abilities. Or so it seemed, until Google’s DeepMind AI program, AlphaGo, roundly defeated the world’s top Go champion.

But ask it to read social cues or interpret what another person is thinking and it wouldn’t know where to start. It wouldn’t even understand that it didn’t know where to start. Outside of its game-playing milieu, AlphaGo is as smart as a rock.

“The problem of intelligence is the greatest problem in science,” says Tomaso Poggio, Eugene McDermott Professor of Brain and Cognitive Sciences at the McGovern Institute. One reason why? We still don’t really understand intelligence in ourselves.

Right now, most advanced AI developments are led by industry giants like Facebook, Google, Tesla and Apple, with an emphasis on engineering and computation, and very little work in humans. That has yielded enormous breakthroughs including Siri and Alexa, ever-better autonomous cars and AlphaGo.

But as Poggio points out, the algorithms behind most of these incredible technologies come right out of past neuroscience research–deep learning networks and reinforcement learning. “So it’s a good bet,” Poggio says, “that one of the next breakthroughs will also come from neuroscience.”

Five years ago, Poggio and a host of researchers at MIT and beyond took that bet when they applied for and won a $25 million Science and Technology Center award from the National Science Foundation to form the Center for Brains, Minds and Machines. The goal of the center was to take those computational approaches and blend them with basic, curiosity-driven research in neuroscience and cognition. They would knock down the divisions that traditionally separated these fields and not only unlock the secrets of human intelligence and develop smarter AIs, but found an entire new field—the science and engineering of intelligence.

A collaborative foundation

CBMM is a sprawling research initiative headquartered at the McGovern Institute, encompassing faculty at Harvard, Johns Hopkins, Rockefeller and Stanford; over a dozen industry collaborators including Siemens, Google, Toyota, Microsoft, Schlumberger and IBM; and partner institutions such as Howard University, Wellesley College and the University of Puerto Rico. The effort has already churned out 397 publications and has just been renewed for five more years and another $25 million.

For the first few years, collaboration in such a complex center posed a challenge. Research efforts were still divided into traditional silos—one research thrust for cognitive science, another for computation, and so on. But as the center grew, colleagues found themselves talking more and a new common language emerged. Immersed in each other’s research, the divisions began to fade.

“It became more than just a center in name,” says Matthew Wilson, associate director of CBMM and the Sherman Fairchild Professor of Neuroscience at MIT’s Department of Brain and Cognitive Sciences (BCS). “It really was trying to drive a new way of thinking about research and motivating intellectual curiosity that was motivated by this shared vision that all the participants had.”

New questioning

Today, the center is structured around four interconnected modules grounded around the problem of visual intelligence—vision, because it is the most understood and easily traced of our senses. The first module, co-directed by Poggio himself, unravels the visual operations that begin within that first few milliseconds of visual recognition as the information travels through the eye and to the visual cortex. Gabriel Kreiman, who studies visual comprehension at Harvard Medical School and Children’s Hospital, leads the second module which takes on the subsequent events as the brain directs the eye where to go next, what it is seeing and what to pay attention to, and then integrates this information into a holistic picture of the world that we experience. His research questions have grown as a result of CBMM’s cross-disciplinary influence.

Leyla Isik, a postdoc in Kreiman’s lab, is now tackling one of his new research initiatives: social intelligence. “So much of what we do and see as humans are social interactions between people. But even the best machines have trouble with it,” she explains.

To reveal the underlying computations of social intelligence, Isik is using data gathered from epilepsy patients as they watch full-length movies. (Certain epileptics spend several weeks before surgery with monitoring electrodes in their brains, providing a rare opportunity for scientists to see inside the brain of a living, thinking human). Isik hopes to be able to pick out reliable patterns in their neural activity that indicate when the patient is processing certain social cues such as faces. “It’s a pretty big challenge, so to start out we’ve tried to simplify the problem a little bit and just look at basic social visual phenomenon,” she explains.

In true CBMM spirit, Isik is co-advised by another McGovern investigator, Nancy Kanwisher, who helps lead CBMM’s third module with BCS Professor of Computational Cognitive Science, Josh Tenenbaum. That module picks up where the second leaves off, asking still deeper questions about how the brain understands complex scenes, and how infants and children develop the ability to piece together the physics and psychology of new events. In Kanwisher’s lab, instead of a stimulus-heavy movie, Isik shows simple stick figures to subjects in an MRI scanner. She’s looking for specific regions of the brain that engage only when the subjects view the “social interactions” between the figures. “I like the approach of tackling this problem both from very controlled experiments as well as something that’s much more naturalistic in terms of what people and machines would see,” Isik explains.

Built-in teamwork

Such complementary approaches are the norm at CBMM. Postdocs and graduate students are required to have at least two advisors in two different labs. The NSF money is even assigned directly to postdoc and graduate student projects. This ensures that collaborations are baked into the center, Wilson explains. “If the idea is to create a new field in the science of intelligence, you can’t continue to support work the way it was done in the old fields—you have to create a new model.”

In other labs, students and postdocs blend imaging with cognitive science to understand how the brain represents physics—like the mass of an object it sees. Or they’re combining human, primate, mouse and computational experiments to better understand how the living brain represents new objects it encounters, and then building algorithms to test the resulting theories.

Boris Katz’s lab is in the fourth and final module, which focuses on figuring out how the brain’s visual intelligence ties into higher-level thinking, like goal planning, language, and abstract concepts. One project, led by MIT research scientist Andrei Barbu and Yen-Ling Kuo, in collaboration with Harvard cognitive scientist Liz Spelke, is attempting to uncover how humans and machines devise plans to navigate around complex and dangerous environments.

“CBMM gives us the opportunity to close the loop between machine learning, cognitive science, and neuroscience,” says Barbu. “The cognitive science informs better machine learning, which helps us understand how humans behave and that in turn points the way toward understanding the structure of the brain. All of this feeds back into creating more capable machines.”

A new field

Every summer, CBMM heads down to Woods Hole, Massachusetts, to deliver an intensive crash course on the science of intelligence to graduate students from across the country. It’s one of many education initiatives designed to spread CBMM’s approach and key to the goal of establishing a new field. The students who come to learn from these courses often find it as transformative as the CBMM faculty did when the center began.

Candace Ross was an undergraduate at Howard University when she got her first taste of CBMM at a summer course with Kreiman trying to model human memory in machine learning algorithms. “It was the best summer of my life,” she says. “There were so many concepts I didn’t know about and didn’t understand. We’d get back to the dorm at night and just sit around talking about science.”

Ross loved it so much that she spent a second summer at CBMM, and is now a third-year graduate student working with Katz and Barbu, teaching computers how to use vision and language to learn more like children. She’s since gone back to the summer programs, now as a teaching assistant. “CBMM is a research center,” says Ellen Hildreth, a computer scientist at Wellesley College who coordinates CBMM’s education programs. “But it also fosters a strong commitment to education, and that effort is helping to create a community of researchers around this new field.”

Quest for intelligence

CBMM has far to go in its mission to understand the mind, but there is good reason to believe that what CBMM started will continue well beyond the NSF-funded ten years.

This February, MIT announced a new institute-wide initiative called the MIT Intelligence Quest, or MIT IQ. It’s a massive interdisciplinary push to study human intelligence and create new tools based on that knowledge. It is also, says McGovern Institute Director Robert Desimone, a sign of the institute’s faith in what CBMM itself has so far accomplished. “The fact that MIT has made this big commitment in this area is an endorsement of the kind of view we’ve been promoting through CBMM,” he says.

MIT IQ consists of two linked entities: “The Core” and “The Bridge.” CBMM is part of the Core, which will advance the science and engineering of both human and machine intelligence. “This combination is unique to MIT,” explains Poggio, “and is designed to win not only Turing but also Nobel prizes.”

And more than that, points out BCS Department Head Jim DiCarlo, it’s also a return to CBMM’s very first mission. Before CBMM began, Poggio and a few other MIT scientists had tested the waters with a small, Institute-funded collaboration called the Intelligence Initiative (I^2), that welcomed all types of intelligence research–even business and organizational intelligence. MIT IQ re-opens that broader door. “In practice, we want to build a bigger tent now around the science of intelligence,” DiCarlo says.

For his part, Poggio finds the name particularly apt. “Because it is going to be a long-term quest,” he says. “Remember, if I’m right, this is the greatest problem in science. Understanding the mind is understanding the very tool we use to try to solve every other problem.”

The quest to understand intelligence

McGovern investigators study intelligence to answer a practical question for both educators and computer scientists. Can intelligence be improved?

A nine-year-old girl, a contestant on a game show, is standing on stage. On a screen in front of her, there appears a twelve-digit number followed by a six-digit number. Her challenge is to divide the two numbers as fast as possible.

The timer begins. She is racing against three other contestants, two from China and one, like her, from Japan. Whoever answers first wins, but only if the answer is correct.

The show, called “The Brain,” is wildly popular in China, and attracts players who display their memory and concentration skills much the way American athletes demonstrate their physical skills in shows like “American Ninja Warrior.” After a few seconds, the girl slams the timer and gives the correct answer, faster than most people could have entered the numbers on a calculator.

The camera pans to a team of expert judges, including McGovern Director Robert Desimone, who had arrived in Nanjing just a few hours earlier. Desimone shakes his head in disbelief. The task appears to make extraordinary demands on working memory and rapid processing, but the girl explains that she solves it by visualizing an abacus in her mind—something she has practiced intensively.

The show raises an age-old question: What is intelligence, exactly?

The study of intelligence has a long and sometimes contentious history, but recently, neuroscientists have begun to dissect intelligence to understand the neural roots of the distinct cognitive skills that contribute to it. One key question is whether these skills can be improved individually with training and, if so, whether those improvements translate into overall intelligence gains. This research has practical implications for multiple domains, from brain science to education to artificial intelligence.

“The problem of intelligence is one of the great problems in science,” says Tomaso Poggio, a McGovern investigator and an expert on machine learning. “If we make progress in understanding intelligence, and if that helps us make progress in making ourselves smarter or in making machines that help us think better, we can solve all other problems more easily.”

Brain training 101

Many studies have reported positive results from brain training, and there is now a thriving industry devoted to selling tools and games such as Lumosity and BrainHQ. Yet the science behind brain training to improve intelligence remains controversial.

A case in point is the “n-back” working memory task, in which subjects are presented with a rapid sequence of letters or visual patterns, and must report whether the current item matches the last, last-but-one, last-but-two, and so on. The field of brain training received a boost in 2008 when a widely discussed study claimed that a few weeks of training on a challenging version of this task could boost fluid intelligence, the ability to solve novel problems. The report generated excitement and optimism when it first appeared, but several subsequent attempts to reproduce the findings have been unsuccessful.

Among those unable to confirm the result was McGovern Investigator John Gabrieli, who recruited 60 young adults and trained them forty minutes a day for four weeks on an n-back task similar to that of the original study.

Six months later, Gabrieli re-evaluated the participants. “They got amazingly better at the difficult task they practiced. We have great imaging data showing changes in brain activation as they performed the task from before to after,” says Gabrieli. “And yet, that didn’t help them do better on any other cognitive abilities we could measure, and we measured a lot of things.”

The results don’t completely rule out the value of n-back training, says Gabrieli. It may be more effective in children, or in populations with a lower average intelligence than the individuals (mostly college students) who were recruited for Gabrieli’s study. The prospect that training might help disadvantaged individuals holds strong appeal. “If you could raise the cognitive abilities of a child with autism, or a child who is struggling in school, the data tells us that their life would be a step better,” says Gabrieli. “It’s something you would wish for people, especially for those where something is holding them back from the expression of their other abilities.”

Music for the brain

The concept of early intervention is now being tested by Desimone, who has teamed with Chinese colleagues at the recently-established IDG/McGovern Institute at Beijing Normal University to explore the effect of music training on the cognitive abilities of young children.

The researchers recruited 100 children at a neighborhood kindergarten in Beijing, and provided them with a semester-long intervention, randomly assigning children either to music training or (as a control) to additional reading instruction. Unlike the so-called “Mozart Effect,” a scientifically unsubstantiated claim that passive listening to music increases intelligence, the new study requires active learning through daily practice. Several smaller studies have reported cognitive benefits from music training, and Desimone finds the idea plausible given that musical cognition involves several mental functions that are also implicated in intelligence. The study is nearly complete, and results are expected to emerge within a few months. “We’re also collecting data on brain activity, so if we see improvements in the kids who had music training, we’ll also be able to ask about its neural basis,” says Desimone. The results may also have immediate practical implications, since the study design reflects decisions that schools must make in determining how children spend their time. “Many schools are deciding to cut their arts and music programs to make room for more instruction in academic core subjects, so our study is relevant to real questions schools are facing.”

Intelligent classrooms

In another school-based study, Gabrieli’s group recently raised questions about the benefits of “teaching to the test.” In this study, postdoc Amy Finn evaluated over 1300 eighth-graders in the Boston public schools, some enrolled at traditional schools and others at charter schools that emphasize standardized test score improvements. The researchers wanted to find out whether raised test scores were accompanied by improvement of cognitive skills that are linked to intelligence. (Charter school students are selected by lottery, meaning that any results are unlikely to reflect preexisting differences between the two groups of students.) As expected, charter school students showed larger improvements in test scores (relative to their scores from 4 years earlier). But when Finn and her colleagues measured key aspects of intelligence, such as working memory, processing speed, and reasoning, they found no difference between the students who enrolled in charter schools and those who did not. “You can look at these skills as the building blocks of cognition. They are useful for reasoning in a novel situation, an ability that is really important for learning,” says Finn. “It’s surprising that school practices that increase achievement don’t also increase these building blocks.”

Gabrieli remains optimistic that it will eventually be possible to design scientifically based interventions that can raise children’s abilities. Allyson Mackey, a postdoc in his lab, is studying the use of games to exercise the cognitive skills in a classroom setting. As a graduate student at University of California, Berkeley, Mackey had studied the effects of games such as “Chocolate Fix,” in which players match shapes and flavors, represented by color, to positions in a grid based on hints, such as, “the upper left position is strawberry.”

These games gave children practice at thinking through and solving novel problems, and at the end of Mackey’s study, the students—from second through fourth grades—showed improved measures of skills associated with intelligence. “Our results suggest that these cognitive skills are specifically malleable, although we don’t yet know what the active ingredients were in this program,” says Mackey, who speaks of the interventions as if they were drugs, with dosages, efficacies and potentially synergistic combinations to be explored. Mackey is now working to identify the most promising interventions—those that boost cognitive abilities, work well in the classroom, and are engaging for kids—to try in Boston charter schools. “It’s just the beginning of a three-year process to methodically test interventions to see if they work,” she says.

Brain training…for machines

While Desimone, Gabrieli and their colleagues look for ways to raise human intelligence, Poggio, who directs the MIT-based Center for Brains, Minds and Machines, is trying to endow computers with more human-like intelligence. Computers can already match human performance on some specific tasks such as chess. Programs such as Apple’s “Siri” can mimic human speech interpretation, not perfectly but well enough to be useful. Computer vision programs are approaching human performance at rapid object recognitions, and one such system, developed by one of Poggio’s former postdocs, is now being used to assist car drivers. “The last decade has been pretty magical for intelligent computer systems,” says Poggio.

Like children, these intelligent systems learn from past experience. But compared to humans or other animals, machines tend to be very slow learners. For example, the visual system for automobiles was trained by presenting it with millions of images—traffic light, pedestrian, and so on—that had already been labeled by humans. “You would never present so many examples to a child,” says Poggio. “One of our big challenges is to understand how to make algorithms in computers learn with many fewer examples, to make them learn more like children do.”

To accomplish this and other goals of machine intelligence, Poggio suspects that the work being done by Desimone, Gabrieli and others to understand the neural basis of intelligence will be critical. But he is not expecting any single breakthrough that will make everything fall into place. “A century ago,” he says, “scientists pondered the problem of life, as if ‘life’—what we now call biology—were just one problem. The science of intelligence is like biology. It’s a lot of problems, and a lot of breakthroughs will have to come before a machine appears that is as intelligent as we are.”