Deep Learning for Autonomous Driving
|
Project 1: Understanding Multimodal Driving Data
Overview:
In this exercise, we look into understanding multimodal driving data.
Firstly, we create visualisation tools that will help understand and debug our future
models (such as in Project 3). In specific, we visualize the outputs of common tasks
such as 3D object detection and point cloud semantic segmentation given a LiDAR point
cloud, corresponding RGB camera image, ground truth semantic labels and network bounding
box predictions. Finally we increase our understanding of the LiDAR sensor itself by
identifying each laser ID from the point cloud directly, and dealing with the distortion
caused by the vehicle motion with the aid of GPS/IMU data.
Report: [PDF]
|
Project 2: Multitask Learning for Semantics and Depth
Overview:
In this exercise, we delve into Multi-Task Learning (MTL) architectures for dense prediction tasks.
In particular, for semantic segmentation (i.e., the task of associating each pixel of an image with
a class label, e.g., person, road, car, etc.) and monocular depth estimation (i.e., the task of
estimating the per-pixel depth of a scene from a single image). We first implement and examine
the structure of DeepLabv3+, and test the influence of hyper-parameters. Next, we implement the
branched architecture based on previous joint architecture and compare their performance.
Then we add the task distillation module to the branched architecture and do some comparison.
Finally, we further improve the model performance with a series of techniques: changing the unit
of depth measurement, substituting upsampling with up-convolution, adding skip connection of
feature2x, including Squeeze and Excitation layer and other hyper-parameter tuning.
|
Report: [PDF]
|
Project 3: 3D Object Detection
Overview:
In this exercise, there are two sub-problems.
In the first problem, we build a 2 stage 3D object detector.
To be more specific, we refine the proposals provided by the first stage
Region Proposal Network (RPN) through a detection pipeline consisting of:
Recall Computation, ROI Pooling, Sample Proposals, Loss Computation and
Non-maximum Suppression (NMS). Then we train the complete network, which
would be used as the baseline for comparison in Problem 2.
In the second problem, we focus on improving the refinement network given
the first stage RPN output. Our proposed refinement network improves the
performance from three aspects: 1. model structure: learning global and
local spatial feature by introducing MLP and canonical transformation;
2. loss function: including GIoU loss for regression task; 3. training
scheme: change the optimizer from SGD to Adam. For evaluation metrics,
we use mean average precision (mAP) of three difficulty levels: easy,
moderate and hard. The ablation test proves the effectiveness of our three
techniques. The combination of them performs the best among all difficulty levels.
|
Report: [PDF]
[PDF]
|