【泡泡一分钟】基于事件相机和RGB相机融合实现不利条件下稳健的物体检测

泡泡机器人SLAM · 公众号 · · 2024-01-09 06:30

正文

每天一分钟，带你读遍机器人顶级会议文章

标题：Fusing Event-based and RGB camera for Robust Object Detection in Adverse Conditions

作者：Abhishek Tomy, Anshul Paigwar, Khushdeep S. Mann, Alessandro Renzaglia, Christian Laugier

来源：2022 IEEE International Conference on Robotics and Automation (ICRA)

编译：pandaman

审核：Zoe

这是泡泡一分钟推送的第 1056 篇文章，欢迎个人转发朋友圈；其他机构或自媒体如需转载，后台留言申请授权

摘要

在图像损坏和不同天气条件下检测物体的能力对于深度学习模型至关重要，尤其是对在自动驾驶等现实世界中的应用时。传统的基于 RGB 的检测在这些条件下会失败，因此设计一个对于主帧检测失败时具有冗余性的传感器套件非常重要。基于事件的相机可在自动驾驶车辆导航过程中遇到的低光照条件和高动态范围场景下补充基于帧的相机。因此，我们提出了一种基于事件和基于帧的相机的冗余传感器融合模型，该模型对常见的图像损坏具有鲁棒性。该方法利用事件的体素网格表示作为输入，并提出了用于帧和事件的两个并行特征提取器网络。与仅基于帧的检测相比，我们的传感器融合方法对损坏的鲁棒性提高了 30% 以上，并且性能优于仅基于事件的检测。该模型在公开发布的 DSEC 数据集上进行训练和评估。

图1 应用于 DSEC 数据集样本图像的 15 种常见损坏类型的可视化。这些图像对应于每种损坏类型的严重性级别 3。

图2 所提出的特征金字塔传感器融合模型的网络架构。事件帧和 RGB 图像通过骨干网络 (ResNet-50) 进行特征提取。相同尺度的金字塔事件和 RGB 特征在输入到 RetinaNet-50 的特征金字塔网络之前被连接起来。

图3 模型性能 mAP (%) 受到 15 种损坏类型和 5 个严重级别的影响。严重性级别 0 表示干净的数据。我们可以观察到，与 RetinaNet-50 (RGB) 和 Early fusion (Event-Gray+RGB) 相比，FPN 融合模型更加稳健，特别是对于雪、霜、雾和对比度条件。

表1 不同腐败类型下的绩效模型。对于每种损坏类型，都会计算所有严重级别的 RPC (%)。

表2 DSEC 训练数据集中的对象注释

表3 消融研究，无需将 RGB 图像同构变换到事件帧的地图精度。

图4 早期融合：在此网络中，RGB 和事件体素在输入到 RetinaNet 之前被连接起来。

图5 不同严重程度下的相对性能。

表4 比较不同提议模型在 MAP 和 RPC 指标上的性能。

Abstract

The ability to detect objects, under image corruptions and different weather conditions is vital for deep learning models especially when applied to real-world applications such as autonomous driving. Traditional RGB-based detection fails under these conditions and it is thus important to design a sensor suite that is redundant to failures of the primary framebased detection. Event-based cameras can complement framebased cameras in low-light conditions and high dynamic range scenarios that an autonomous vehicle can encounter during navigation. Accordingly, we propose a redundant sensor fusion model of event-based and frame-based cameras that is robust to common image corruptions. The method utilizes a voxel grid representation for events as input and proposes a two-parallel feature extractor network for frames and events. Our sensor fusion approach is more robust to corruptions by over 30% compared to only frame-based detections and outperforms the only event-based detection. The model is trained and evaluated on the publicly released DSEC dataset.

如果你对本文感兴趣，请点击点击 阅读原文 下载完整文章，如想查看更多文章请关注 【泡泡机器人SLAM】公众号（paopaorobot_slam） 。