Imaginary world models based on neural stochastic processes to overcome reinforcement learning barriers

Navn på bevillingshaver

Melih Kandemir

Institution

University of Southern Denmark

Beløb

DKK 4,999,000

År

2021

Bevillingstype

Semper Ardens: Accelerate

Hvad?

Intelligent autonomous agents that interact with their environment, such as drones, self-driving cars, and robot arms, are shaping our technological future. The state-of-the-art artificial intelligence framework for training autonomous agents on tasks, so-called reinforcement learning, can outperform human performance in board games. But it is a long-lasting wonder for the machine learning community why the same reinforcement learning framework cannot find robust solutions to even modest optimal control problems, such as balancing a double pendulum, when it is not affordable to make excessively many failed trials. The goal of my project is to identify the reasons behind the high data hunger of the reinforcement learning setup and develop algorithms that can overcome it.

Hvorfor?

Reinforcement learning can deliver successful control policies, functions that decide which action an autonomous agent will take at a given situation, only after the agent interacts with its environment many times. Learning from exhaustive trial and error is an acceptable assumption for synthetic environments such as game playing, the prime use case of reinforcement learning thus far. The same assumption also sets a hard bottleneck on the applicability of reinforcement learning algorithms to many real-world use cases where trials are expensive and errors are dangerous, such as robotics, autonomous driving, and human-machine interaction. Understanding the reasons behind the data hunger of reinforcement learning would provide new machine learning tools helpful to overcome this bottleneck.

Hvordan?

The biggest barrier against data-efficient reinforcement learning is the lack of an effective mechanism for the autonomous agent to build a mental map of its environment. Such a map would then allow the agent to simulate the possible futures and rule out the actions that may lead to task failure. The recently invented neural stochastic processes can encode the behaviour of complex and random dynamical environments with unprecedented precision by incorporating neural networks into stochastic differential equation systems. I will develop reinforcement learning algorithms powered by neural stochastic processes that enable autonomous agents to learn accurate mental maps from significantly less interactions with the environment than the present methods provide.

Tilbage til oversigtssiden