Uber端到端：端到端可解释神经运动规划器

自动驾驶之心 · 公众号 · · 2025-01-19 00:00

正文

作者 | 论文推土机编辑 | 自动驾驶之心

原文链接：https://zhuanlan.zhihu.com/p/15835142790

点击下方卡片，关注“ 自动驾驶之心 ”公众号

戳我-> 领取 自动驾驶近15个 方向学习路线

>> 点击进入→ 自动驾驶之心 『端到端』 技术交流群

本文只做学术分享，如有侵权，联系删文

Zeng, W., Luo, W., Suo, S., Sadat, A., Yang, B., Casas, S., & Urtasun, R. (2019). End-To-End Interpretable Neural Motion Planner.2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Presented at the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA. https://doi.org/10.1109/cvpr.2019.00886.

本文采用了轨迹采样+cost map prediction的方式实现learning based planning方案。

网络的输入是lidar cloud point and hp map, 然后通过cnn卷成feature最后接MLP进行预测。预测有两个部分，可以理解成感知任务和规划任务。其中感知任务包括了3D detection and future motion forcasting. 规划的任务则是预测稠密的cost volumes. 此外这个输入的特征空间还带有时序的信息，通过融合多帧的感知信息，然后进行cat组成了存储时序信息的features。规划任务是预测稠密的cost volume, 不过文章提到预测这个东西主要靠的是gt轨迹，我们希望预测到gt附近区域是cost比较低的，但是这个奖励太稀疏了，所以单纯的利用gt信息学习是很困难的，所以额外的通过两个感知任务帮忙调整感知backbone输入，这样也可以提升planning的学习效果，原文的表达如下：“we introduce an another perception loss that encourages the intermediate representations to produce accurate 3D detections and motion forecasting. This ensures the interpretability of the intermediate representations and enables much faster learning。”

此外 HDmap上存储了各种道路环境的语义信息：“we exploit HD maps that contain information about the semantics of the scene such as the location of lanes, their boundary type (e.g., solid, dashed) and the location of stop signs.”这些道路，十字路口，车道线以及红绿灯等静态道路元素也被提取出来作为cost map中的静态元素，这些东西被铺进多个图层，也生成了M个通道，最后和lidar point cloud上提取出来的T个时间维度的信息组合在一起给到后面的planning使用。

有了这个cost map, 下面通过螺旋曲线的采样生成planning anchor, 将这些anchors铺到cost map上面找cost最低的轨迹。

文章号称有以下三个优势：

具备可解释性和多模态能力，从后面的数据来看，可解释性来自cost map的高低，能够有效可视化出来碰撞或者违背道路交通规则的区域。此外额外的两个感知任务3d detection and motion forcasting也能提供中间信息的可视化做到一定的可解释性。
能够进行联合训练，避免了信息损失。
能够处理不确定性，从后面看不确定性的处理也是在cost map上体现出来的，cost map上出现了非凸的low cost区域用于表达多种驾驶可能性。不过这个证明有点弱，仅仅是出一张有多模态决策的图还是远远不够的，本文还是缺乏充足的理论和实验论证这里的网络设计是可以保证多模态的。

objective

planning模块的目标是将采样获得trajectory铺进cost volume中，从中找到cost最低的轨迹作为最佳轨迹：

而轨迹的costing方式则是则是通过从cost map中索引voxel-wise的cost进行计算的。感知的输入是H,W,Z维度的lidar cloud points, 同时为了考虑时序上的动态agents的信息，雷达点云信息还要融合多帧的时序结果，在Z维度叠起来，H,W,ZT. 另一方面，Wiley考虑道路环境元素，本文给各个道路元素都准备了一层通道，包括road, intersections, lanes, lane boudaries, and traffic lights. 原文表达是“we exploit HD maps that contain information about the semantics of the scene such as the location of lanes, their boundary type (e.g., solid, dashed) and the location of stop signs. Similar to [5], we rasterize the map to form an M channels tensor, where each channel represents a different map element, including road, intersections, lanes, lane boundaries, traffic lights, etc.”所以维度变成H,W,(ZT+M).

感知backbone是个CNN，作为下面两个头的输入，其中感知头预测bounding box and motion forcasting. cost volume头预测cost volume, 这里主要看下cost volume的预测。这里采用了max margin loss, gt是人驾轨迹。loss希望区分人驾轨迹的区域和其他区域，人驾轨迹的地方就是cost低的地方：“The intuition behind is to encourage the ground-truth trajectory to have the minimal cost, and others to have higher costs.”

c表示cost, d表示轨迹距离，gamma表示traffic rule violation.

在负样本采样中，需要采样大量的偏移人驾轨迹的曲线，这里除了用planning anchor采样逻辑外，还对起点状态做了一个轻微扰动：“except there is 0.8 probability that the negative sample doesn't obey SDV's initial states, e.g. we randomly sample a velocity to replace SDV's initial velocity.”

planinng anchor

横向采用螺旋曲线进行采样:

纵向则是采用了constant accleration直接采样加速度，非常粗糙。

文章还提到了一点，“Note that Clothoid curves cannot handle circle and straight line trajectories well, thus we sample them separately. ”这个螺旋曲线不能表达直线和圆形，所以直行和掉头要出问题，所以额外单独采样，他们的采样比例是：“The probability of using straightline, circle and Clothoid curves are 0.5, 0.25, 0.25 respectively.”

experiment

实验关注L2 distance, collision rate, and lane violation rate这几个指标，然后做了几个对比实验：

Ego-motion forecasting (Ego-motion)，仅用ego motion 作为输入
Imitation Learning (IL)：imitation is all you need, 用纯粹imitation学习
Adaptive Cruise Control (ACC)：没有细说怎么处理，不过从后面的实验结果分析上来看，应该是加了lane violation的loss
Plan w/ Manual Cost (Manual): 人工设计cost

对比结果如下：

结论就是：“Egomotion and IL baselines give lower L2 numbers as they optimize directly for this metric, however they are not good from planning perspective as they have difficulty reasoning about other actors and collide frequently with them.”

然后对比其他几个方面的对比结果。包括联合训练(所谓联合就是是不是要感知任务和规划任务一起训)，输入的时序融合长度，是否确实gamma penalty等等：

Uber端到端：端到端可解释神经运动规划器

正文

objective

planinng anchor

experiment

请到「今天看啥」查看全文