Driverless Vehicle Control Algorithm Development

The ultimate goal of the proponents of driverless vehicles is to equip the vehicles with human like intelligence to navigate autonomously. Several automobile manufacturers and tech giants aim to launch commercially available fully autonomous vehicles (AV) by the dawn of the next decade. Automobile giants like Tesla Motors Inc., BMW, Ford Motor Co. and Volvo have all pledged to have fully autonomous vehicles on the road within the next few years. Waymo, formerly the Google’s autonomous vehicle project, has for the first time let their autonomous car drive by itself without anyone sitting on the front seats. Furthermore, Chinese internet giant Baidu, has ambitious plans for success in driverless vehicles industry. For example, China has set a goal for 10-20% of vehicles to be highly autonomous by 2025, and for 10% of cars to be fully self-driving by 2030. Despite the promises from the tech and automobile industries, vehicles that are good enough to roam extensively without human involvement is still a far-flung reality that warrants extensive research effort. Current research at IDT, led by Dr Varuna De Silva, is focused on addressing some of the challenges posed by scaling driverless vehicle technology beyond experimental setups towards real-world environments. The challenges addressed include: 1) data collection in unstructured environments, 2) multimodal signal processing and perception, and 3) data driven decision making policy learning. Selected research results can be found below with associated literature where appropriate.

A Multimodal Perception Driven Self-Evolving Autonomous Vehicle for Open Experimentation

Although majority of autonomous vehicles perform well in structured surroundings when presented with niche environments (indoor mobility, mobility on pavements, mobility in parks, woods or town centres), the need for human intervention is significantly raised. While this can in part be attributed to a data dependent agent (Deep Networks) used for robotic perception, the fault by and large can be placed at the feet of the dataset. To that end the importance of developing an open-source dataset benchmarked against Perception, Localisation and Driver Policy, becomes abundantly clear. To assist in the task of collecting realistic multimodal data streams, we have developed a self-evolving autonomous vehicle for seamless indoor and outdoor navigation. Initially the data collection spanned the period from May 2018 to October 2018. The dataset covers 1.2 km of recorded driving in Queen Elizabeth Olympic Park, London. The Test-Bed was autonomously driven throughout the period of data collection. Since the original data collection, the team has collected many kms of data under various conditions in the Olympic Park and in indoor environments. The collected data were used for free space detection, human activity recognition, path planning and localisation. Additionally, we have collected further data in various unstructured environments of driving. For example, driving data in urban and rural areas were captured in Sri Lanka, where the driving conditions are different to that of UK, and further data were collected at the Toyota Manufacturing plant in Derby, UK to capture variations in indoor movements.

Relevant paper: https://arxiv.org/abs/1901.02858

Data Fusion in Autonomous Mobile Robots for Free Space Detection

Autonomous robots that assist humans in day to day living tasks are becoming increasingly popular. Autonomous mobile robots operate by sensing and perceiving their surrounding environment to make accurate driving decisions. A combination of different sensors such as LiDAR, radar, ultrasound sensors and cameras are utilised to sense the surrounding environment of autonomous vehicles. These heterogeneous sensors simultaneously capture various physical attributes of the environment. Such multimodality and redundancy of sensing need to be positively utilised for reliable and consistent perception of the environment through sensor data fusion. However, these multimodal sensor data streams are different from each other in many ways, such as temporal and spatial resolution, data format, and geometric alignment. For the subsequent perception algorithms to utilise the diversity offered by multimodal sensing, the data streams need to be spatially, geometrically, and temporally aligned with each other. In this work, we address the problem of fusing the outputs of a Light Detection and Ranging (LiDAR) scanner and a wide-angle monocular image sensor for free space detection.

We further improved this work in Free Space Detection (FSD) to cater for ever-evolving environments. If the dataset used to train the network lacks diversity or sufficient data quantities, the driver policy used to control the vehicle may induce safety risks. Furthermore, when a dataset lacks modality, it forces the algorithm to rely on a single modality reducing its reliability in a changing environment. This research uses a self-evolving FSD framework that leverages online active machine learning paradigms and sensor data fusion as a proof of concept for a benchmark dataset and an autonomous platform. In essence, the self-evolving FSD framework queries image data against a reliable data stream, ultrasound, before fusing the sensor data.

Relevant paper: https://www.mdpi.com/1424-8220/18/8/2730

LiDAR Data Processing and Scene Perception

Due to the degree to which the camera can be affected by lighting and wearable sensors scantiness, neither one modality can capture the required data to perform the task confidently. That being the case, range sensors, like light detection and ranging (LiDAR), can complement the process to perceive the environment more robustly. LiDAR is a powerful laser scanner that provides a detailed map of the environment. However, efficient and accurate mapping of the environment is yet to be obtained due to signal detection problems that is unique to LiDAR, e.g., glass walls are invisible to LiDAR. Signal processing methods need to be adopted to overcome these issues. For example, in one of the works, we develop a signal processing algorithm for glass detection in LiDAR.

Relevant paper: https://www.mdpi.com/1424-8220/21/7/2263

Once the LiDAR streams are cleaned, they can be processed for scene perception tasks in driverless vehicles which employ them, e.g., human activity recognition. Increasingly, the task of detecting and recognising the actions of a human has been delegated to some form of neural network processing camera or wearable sensor data. Most recently, researchers have been exploring ways to apply convolutional neural networks to 3-D data. These methods typically rely on a single modality and cannot draw on information from complementing sensor streams to improve accuracy. This work focuses on developing a framework to address human activity recognition by leveraging the benefits of sensor fusion and multimodal machine learning. Given both RGB and point cloud data, our method describes the activities being performed by subjects using regions with a convolutional neural network (R-CNN) and a 3-D modified Fisher vector network. Evaluated on a custom captured multimodal dataset demonstrates that the model outputs remarkably accurate human activity classification (90%). Furthermore, this framework can be used for sports analytics, understanding social behaviour, surveillance, and perhaps most notably by autonomous vehicles (AVs) to data-driven decision-making policies in urban areas and indoor environments.

Relevant paper: https://ieeexplore.ieee.org/document/9464313

Learning Control Policies of Driverless Vehicles in Complex Urban Environments - Imitation Learning for Control

The way we drive, and the transport of today are going through radical changes. Intelligent mobility envisions to improve the efficiency of traditional transportation through advanced digital technologies, such as robotics, artificial intelligence and Internet of Things. Central to the development of intelligent mobility technology is the emergence of connected autonomous vehicles (CAVs) where vehicles are capable of navigating environments autonomously. For this to be achieved, autonomous vehicles must be safe, trusted by passengers, and other drivers. However, it is practically impossible to train autonomous vehicles with all the possible traffic conditions that they may encounter. This research work proposes an alternative solution of using infrastructure to aid CAVs to learn driving policies, specifically for complex junctions, which require local experience and knowledge to handle. The proposal is to learn safe driving policies through data-driven imitation learning of human-driven vehicles at a junction utilising data captured from surveillance devices about vehicle movements at the junction. The proposed framework is demonstrated by processing video datasets captured from uncrewed aerial vehicles (UAVs) from three intersections around Europe which contain vehicle trajectories. An imitation learning algorithm based on long short-term memory (LSTM) neural network is proposed to learn and predict safe trajectories of vehicles. The proposed framework can be used for many purposes in intelligent mobility, such as augmenting the intelligent control algorithms in driverless vehicles, benchmarking driver behaviour for insurance purposes, and for providing insights to city planning.

Relevant paper: https://www.mdpi.com/2072-4292/11/23/2723

Learning Data-Driven Decision-Making Policies in Multi-Agent Environments for Autonomous Systems

Autonomous systems such as Connected Autonomous Vehicles (CAVs) and assistive robots are set improve the way we live. Autonomous systems need to be equipped with capabilities for Reinforcement Learning (RL); a type of machine learning where an agent learns by interacting with its environment through trial and error, which has gained significant interest from research community for its promise to efficiently learn decision making through abstraction of experiences. However, most of the control algorithms used today in current autonomous systems such as driverless vehicle prototypes or mobile robots are controlled through supervised learning methods or manually designed rule-based policies. Additionally, many emerging autonomous systems such as driverless cars, are set in a multi-agent environment, often with partial observability. Learning decision making policies in multi-agent environments is a challenging problem, because the environment is not stationary from the perspective of a learning agent, and hence the Markov properties assumed in single agent RL does not hold. This work focuses on learning decision-making policies in multi-agent environments, both in cooperative settings with full observability and dynamic environments with partial observability. The results illustrate how agents learn to cooperate to achieve their objectives successfully.