MIMIc: Multimodal Imitation Learning in MultI-Agent Environments

The aim of this project is to develop data-driven policy learning algorithms for multi-agent environments utilising multimodal data sources.

In the UK, we are not allowed to drive a vehicle until we are 17 years old. It is because, driving is a complex and safety critical activity that requires many advanced cognitive skills like recognition of possible threats, anticipation of behaviour of other road users and agile reaction to emerging situations. Think about a football player making decisions on the field. A good player can sense the opportunities, through anticipating what other players will do, and select an action that will increase the odds of scoring. It takes a long time for humans to develop these advanced cognitive skills, to become an expert at such complex real-world tasks. Artificial Intelligence (AI) has made significant progress during the last decade, demonstrated by breakthroughs in cancer detection, computers beating 'Go' masters and intelligent robotics. However, if AI is to live up to its science fictional promises to assist humanity or even supersede human intelligence, it should at least be equipped with cognitive skills such as those possessed by humans. This project aims to develop ground-breaking algorithms that equip autonomous systems with human like cognitive skills required to thrive in real world environments.

We are focused on applications that require autonomous agents (e.g., robot or driverless car) to interact with multiple intelligent agents in the environment to accomplish a task (known as Multi-Agent Environments: MAEs). Such applications require an agent to anticipate the behaviour of other agents and to select the most appropriate course of actions. Equipping agents with such autonomous decision-making capability is known as policy learning. Compared to policy learning in single agent domains (teaching a robot to walk or a computer to play a video game), the recent progress of policy learning in MAEs has been quite modest. This is due to multiple reasons: 1) due to agent actions the environment is dynamic; 2) multi-agent policy learning suffers from a theoretical limitation known as Curse of Dimensionality (CoD); 3) Utility functions that capture agent objectives are difficult to define; and 4) there is a significant lack of adequate multi-agent datasets that allow meaningful research. This project proposes to undertake research into policy learning in MAEs, by addressing the above limitations.

Our unique approach to policy learning in MAEs is motivated by how humans thrive in similar settings. Firstly, we perceive the world through multiple senses, (i.e., vision, audition, touch) enabling a rich perception of the world. Secondly, when acting in a MAE, humans do not pay attention to all the stimuli but only to key stimuli, e.g., when a football player is attacking the ball, the player pays attention only to the teammates capable of affecting a goal and the key defenders. Finally, the learning paradigm we employ known as imitation learning is an emerging methodology to learn by observing experts, which is a productive approach that we use to learn new skills. Accordingly, we propose to learn realistic policies in MAEs through imitation learning by leveraging multimodal data fusion and selective-attention modelling. Multimodal data fusion allows to capture high dimensional context of the real world and selective attention model allows for allaying the issue of CoD. We have been provided with a unique multimodal multi-agent dataset and access to state-of-the-art facilities to capture data by Chelsea Football Club facilitating this ambitious research project.

The project outputs will be subjectively validated as a tool to answer "what-if" questions related to game play in football assisting coaching staff to visualise speculative game strategies, and as a computational benchmark to quantify cognitive skills of football players. The planned impact activities will ensure the project will leave a legacy in AI development benefiting UK PLC through significant contribution in multiple high growth areas, such as driverless vehicles, video gaming, and assistive robots.

This is a two-year research project, completing in the end of 2021, and is supported by the UKRI EPSRC with a £258,875 funding. The project is aligned with the Artificial Intelligence, Machine Learning and Data Analytics research theme of the IDT.

As for the notable outcomes of the project to date, we have been successful in developing imitation models of players by utilising tracking data employing imitation learning and reinforcement learning. We have also progressed in developing 3D pose detection work through advanced convolutional neural networks and quantitative metrics such as goal probability models and pass probability models. Currently, we are working on integrating these results, for simulation activity and subsequent subjective validation.

For further info, please follow https://gow.epsrc.ukri.org/NGBOViewGrant.aspx?GrantRef=EP/T000783/1

Or contact Dr Varuna De Silva.