专栏名称: 3DCV

关注工业3D视觉、SLAM、自动驾驶技术，更专注3D视觉产业的信息传播和产品价值的创造，深度聚焦于3D视觉传感器、SLAM产品，使行业产品快速连接消费者。

哪些具身智能研究方向容易受到审稿人青睐？来看CoRL 2024收录情况

3DCV · 公众号 · · 2024-11-04 07:00

正文

点击下方卡片，关注 「3DCV」 公众号
选择星标，干货第一时间送达

来源：具身智能之心

添加小助理：cv3d008，备注：方向+学校/公司+昵称，拉你入群。文末附3D视觉行业细分群。

扫描下方二维码，加入「 3D视觉从入门到精通」知识星球，星球内凝聚了众多3D视觉实战问题，以及各个模块的学习资料：近20门独家秘制视频课程、最新顶会论文、计算机视觉书籍、优质3D视觉算法源码等。想要入门3D视觉、做项目、搞科研，欢迎扫码加入！

CoRL 2024 是机器人学习会议（Conference on Robot Learning），是聚焦于机器人学与机器学习交叉领域的年度国际会议。 CoRL2024 的成果涉及多个方面，包括机器人手部触觉、机器人移动操作、运动控制与学习、导航、人机交互、远程操作等领域。

在手部触觉方面，有展示学习人工智能实现的灵巧操作；拟人化机器人手的远程操作；实现重力不变的手持物体旋转并结合模拟到真实的触摸技术；在开源灵巧机器人手上展示从人类学习；基于软视觉的指尖触觉感知；多功能机器人操作远程操作系统；通过手持声学振动实现物体感知；混合软刚性机器人平台通过演示学习可推广技能；通过视觉触觉传感学习精细操作。
在移动操作方面，有展示在动态环境中的长视距移动操作及零样本、随处部署的操作策略，还有开源全向移动操作机器人用于机器人学习，以及实现边缘设备上自主移动机器人的实时、稳健 3D 映射、导航和语义分割。
在运动控制与学习方面，有通过离线数据集的扩散实现实时腿部运动控制；人形机器人的跑步和跳跃；四足机器人的零样本安全以及在具有挑战性的不连续地形上敏捷跳跃。
在人机交互方面，有多功能仿生类人机器人头部用于沉浸式人机交互。
在远程操作方面，有通过沉浸式增强现实实现伸展控制；用于富有表现力的全臂远程操作的运动学重定向算法；具有沉浸式主动视觉反馈的远程操作。

本期整理了CoRL 2024 部分被接收的论文 ，按照研究方向分类。随我们一起看看吧！

内容出自国内首个具身智能全栈学习社区： 具身智能之心知识星球 ，这里包含所有你想要的。

人形机器人

OKAMI: Teaching Humanoid Robots Manipulation Skills through Single Video lmitation, https://arxiv.org/abs/2410.11792

Humanoid Parkour Learning, https://arxiv.org/abs/2406.10759

Adapting Humanoid Locomotion over Challenging Terrain via Two-Phase Training , https://openreview.net/attachment?id=O0oK2bVist&name=pdf

机器人学习、规划

Theia: Distilling Diverse Vision Foundation Models for Robot Learning , https://arxiv.org/pdf/2407.20179

BodyTransformer:Leveraging RobotEmbodimentforPolicyLearning , https://openreview.net/pdf?id=Oce2215aJE

Gameplay Filters: Robust Zero-Shot Safety through Adversarial Imagination , https://openreview.net/pdf?id=Ke5xrnBFAR

Learning to Walk from Three Minutes of Real-World Data with Semi-structured Dynamics Models , https://openreview.net/pdf?id=evCXwlCMIi

Towards Open-World Grasping with Large Vision-Language Models , https://openreview.net/pdf?id=QUzwHYJ9Hf

Safe Bayesian Optimization for the Control of High-Dimensional Embodied Systems , https://openreview.net/pdf?id=8PcRynpd1m

LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Videos , https://openreview.net/pdf?id=zIWu9Kmlqk

Trajectory Improvement and Reward Learning from Comparative Language Feedback, https://openreview.net/pdf?id=1tCteNSbFH

Policy Adaptation via Language Optimization: Decomposing Tasks for Few-Shot Imitation, https://openreview.net/forum?id=qUSa3F79am

Learning Transparent Reward Models via Unsupervised Feature Selection , https://openreview.net/pdf?id=2sg4PY1W9d

MaIL: Improving Imitation Learning with Selective State Space Models, https://openreview.net/pdf?id=IssXUYvVTg

Bootstrapping Reinforcement Learning with Imitation for Vision-Based Agile Flight , https://openreview.net/forum?id=bt0PX0e4rE

Autonomous Improvement of Instruction Following Skills via Foundation Models , https://openreview.net/attachment?id=8Ar8b00GJC&name=pdf

Robotic Control via Embodied Chain-of-Thought Reasoning , https://openreview.net/pdf?id=S70MgnIA0v

Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation , https://openreview.net/attachment?id=AuJnXGq3AL&name=pdf

机械臂

DexCatch: Learning to Catch Arbitrary Objects with Dexterous Hands , https://arxiv.org/abs/2310.08809

General Flow as Foundation Affordance for Scalable Robot Learning, Chengbo Yuan, Chuan Wen, Tong Zhang, Yang Gao†, https://general-flow.github.io/, CoRL 2024.

Leveraging Locality to Boost Sample Efficiency in Robotic Manipulation, Tong Zhang, Yingdong Hu, Jiacheng You, Yang Gao†, https://sgrv2-robot.github.io/, CoRL 2024.

HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers, Jianke Zhang∗, Yanjiang Guo∗, Xiaoyu Chen,Yen-Jen Wang, Yucheng Hu, Chengming Shi, Jianyu Chen†, https://arxiv.org/abs/2410.05273, CoRL 2024.

Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning, Zhecheng Yuan*, Tianming Wei*, Shuiqi Cheng, Gu Zhang, Yuanpei Chen, Huazhe Xu†, https://gemcollector.github.io/maniwhere/, CoRL 2024.

RiEMann: Near Real-Time SE(3)-Equivariant Robot Manipulation without Point Cloud Segmentation, Chongkai Gao, Zhengrong Xue, Shuying Deng, Tianhai Liang, Siqi Yang, Lin Shao, Huazhe Xu†, https://riemann-web.github.io/, CoRL 2024.

ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter , https://arxiv.org/abs/2407.11298

ALOHAUnleashed: A Simple Recipe for Robot Dexterity , https://aloha-unleashed.github.io/assets/aloha_unleashed.pdf . 双臂操作。

Mobile ALOHA: Learning Bimanual Mobile Manipulation using Low-Cost Whole-Body Teleoperation , https://openreview.net/forum?id=FO6tePGRZj . 双臂操作。

RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands , https://openreview.net/attachment?id=4Of4UWyBXE&name=pdf . 双臂操作。

DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes , https://openreview.net/attachment?id=5W0iZR9J7h&name=pdf

具身感知

VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding , https://arxiv.org/pdf/2410.13860 . 3D场景理解，3D视觉定位。

GraspSplats: Efficient Manipulation with 3D Feature Splatting , https://arxiv.org/html/2409.02084 .

Transferable Tactile Transformers for Representation LearningAcross Diverse Sensors and Tasks, https://arxiv.org/abs/2406.13640

D3RoMa: Disparity Diffusion-based Depth Sensing for Material-Agnostic Robotic Manipulation, https://openreview.net/attachment?id=7E3JAys1xO&name=pdf

LiDARGrid: Self-supervised 3D Opacity Grid from LiDAR for Scene Forecasting, https://openreview.net/attachment?id=MfuzopqVOX&name=pdf

自动驾驶运动规划

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models , https://arxiv.org/abs/2402.12289 . 用于场景描述，场景分析和分层规划。

Uncertainty-Aware Decision Transformer for Stochastic Driving Environments , https://arxiv.org/abs/2309.16397 . 提出了 UNREST，一种针对随机驾驶环境的规划方法。

Hint-AD: Holistically Aligned Interpretability in End-to-End Autonomous Driving , https://arxiv.org/pdf/2409.06702

机器人操作

OKAMI: Teaching Humanoid Robots Manipulation Skills through Single Video lmitation, https://arxiv.org/abs/2410.11792

Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own , https://arxiv.org/abs/2310.02635

General Flow as Foundation Affordance for Scalable Robot Learning , https://arxiv.org/abs/2401.11439

A Universal Semantic-Geometric Representation for Robotic Manipulation , https://arxiv.org/abs/2306.10474

Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning , https://arxiv.org/abs/2407.15815

GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs , https://arxiv.org/abs/2410.03645

RiEMann: Near Real-Time SE(3)-Equivariant Robot Manipulation without Point Cloud Segmentation , https://arxiv.org/abs/2403.19460

RoboGolf: Mastering Real-World Minigolf with a Reflective Multi-Modality Vision-Language Model , https://arxiv.org/abs/2406.10157