Learning by Doing: How to Adapt to Changing Environments
Navn på bevillingshaver
Jonas Peters
Institution
University of Copenhagen
Beløb
DKK 2,708,000
År
2017
Bevillingstype
Semper Ardens: Accelerate
Hvad?
Humans mostly learn while being embedded in an environment. We learn how to induce a certain reaction from our parents, depending on their emotional state. We learn how to beat our neighbor in table tennis and how to cycle safely, even in heavy traffic. A truly astonishing ability of humans is how easily we can adapt our behavior when the "rules" of the environments change. Changing the table tennis bat requires only slight modifications in the execution of our strokes. And it is not to difficult to learn how to drive a tandem if we can already drive a bicycle. In this project, we want to study mathematical principles under which such a transfer of knowledge and skills is possible.
Hvorfor?
In the last decades, major progress has been made in formalizing the problem of learning in an interacting environment into the language of data science. The success of Alpha Go is one of such examples. Less progress has been made in developing ideas how a learned strategy can be adapted to changing environments. In reality, however, we face such changes in environment almost all the time. They necessitate us to adapt our policies, even if they have been successful for many years. We hope that our research sheds light on how to detect when and how a policy must be adapted.
Hvordan?
The task of learning in an interactive environment is studied in different scientific fields: control theory, reinforcement learning, and causal inference. In a first step, we will try to bridge the gap between these mostly distinct communities. This will be done by studying what answer the different fields provide for the same research problem. Only if we speak the same language, we can benefit from developments in other research fields. In a second step, we suppose that we are required to adapt to a changing environment. We will study how we can exploit invariances between the different environments, and how these invariances can be learned from data.