图注:论文《Observation-based Optimal Control Law Learning with LQR Reconstruction》实验展示
图注:来源于论文《Unidentifiability of System Dynamics: Conditions and Controller Design》
代表性论文:
Unidentifiability of System Dynamics: Conditions and Controller Design https://arxiv.org/abs/2308.15493
Observation-based Optimal Control Law Learning with LQR Reconstruction https://arxiv.org/abs/2312.16572
Multi-Robot Stochastic Patrolling via Graph Partitioning, Weizhen Wang https://ieeexplore.ieee.org/document/10683971
清华大学智能产业研究院(AIR)
主页:https://air.tsinghua.edu.cn/
导师:张亚勤、马维英、赵峰等人
简介:清华大学智能产业研究院(Institute for AI Industry Research, Tsinghua University,英文简称AIR,THU)是面向第四次工业革命的国际化、智能化、产业化的研究机构。研究院建立了智慧交通、智慧物联、智慧医疗、大数据智能和智能机器 人等5个科研团队,面向世界科技前沿、经济主战场、国家重大 需求、人民生命健康开展前沿研究,推动技术落地。
研究方向:智慧物联、智慧交通、智慧医疗、大数据智能、智能机器人
研究成果:
图注:清华大学万国数据教授、智能产业研究院(AIR)执行院长刘洋教授课题组在基于知识迁移的增量学习方面取得新进展,相关研究成果“基于知识迁移的多语言神经机器翻译增量学习方法”(英文名称Knowledge Transfer in Incremental Learning for Multilingual Neural Machine Translation)于北京时间2023年7月11日获得人工智能领域重要国际会议ACL 2023颁发的杰出论文奖(Outstanding Paper Award)。
代表性论文:
DecisionNCE:Embodied Multimodal Representations via Implicit Preference https://arxiv.org/abs/2402.18137
Evolution of Future Medical AI Models — From Task-Specific, Disease-Centric to Universal Health https://ai.nejm.org/doi/full/10.1056/AIp2400289
ESM All-Atom: Multi-Scale Protein Language Model for Unified Molecular Modeling https://arxiv.org/abs/2403.12995
清华大学智能系统与机器人实验室(ISR Lab)
主页:https://group.iiis.tsinghua.edu.cn/~isrlab/
导师:陈建宇
简介:智能系统与机器人实验室,简称ISRLab,是由陈建宇教授创立的一个前沿科研机构。该实验室隶属于清华大学跨学科信息科学研究所(IIIS, Institute for Interdisciplinary Information Sciences)及上海期智研究院。ISRLab的核心目标是研发高性能、高智能的先进机器人系统。ISRLab在机器人硬件设计、运动控制、感知与识别、人机交互等方面展开深入研究,旨在提升机器人的环境适应能力、任务执行效率和智能化水平。除此以外,ISRLab也在强化学习算法、策略优化、仿真环境构建等方面积极探索,旨在让机器人能够通过不断试错和学习来优化自身行为,实现更复杂的任务执行。ISRLab也开始关注大型语言模型在机器人领域的应用。通过集成先进的语言理解能力,机器人可以更好地理解人类指令、进行对话交流,并在一定程度上实现语义推理和决策制定。
研究方向:机器人技术、强化学习、大型语言模型
研究成果:
图注:论文《DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment》成果展示
图注:论文《Decentralized Motor Skill Learning for Complex Robotic Systems》成果展示
图注:论文《Asking Before Acting: Gather Information in Embodied Decision Making with Language》成果展示
代表性论文:
DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment https://arxiv.org/abs/2307.00329
Decentralized Motor Skill Learning for Complex Robotic Systems https://arxiv.org/abs/2306.17411
Asking Before Acting: Gather Information in Embodied Decision Making with Language Models https://arxiv.org/abs/2305.15695
Learning Robust, Agile, Natural Legged Locomotion Skills in the Wild https://arxiv.org/abs/2304.10888
具身感知与交互实验室(Embodied Perception and InteraCtion (EPIC) Lab)
图注:基于规划的视觉伺服:设计了一种基于二次规划的局部规划器(控制器),其能够在视觉伺服过程中处理视野、关节(位置、速度/力矩)极限并缓解奇异现象的出现,结合基于采样的全局规划框架,能够以高效率处理大部分受约束视觉伺服问题(T-MECH 2024 Under Review)
图注:无人送货机器人:与唯品会合作,提出了一种基于多传感器融合的无人系统,具有自主导航、定位、规划和控制算法,以解决物流园区的最后一英里交付问题。
代表性论文:
Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications https://ieeexplore.ieee.org/document/10657322
DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Iterative Diffusion-Based Refinement https://ieeexplore.ieee.org/document/10655787
3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-Labelling https://ieeexplore.ieee.org/document/10656074
Cognitive Navigation for Intelligent Mobile Robots: A Learning-Based Approach with Topological Memory Configuration https://ieeexplore.ieee.org/document/10551318
图注:论文《TransVOD: End-to-End Video Object Detection with Spatial-Temporal Transformers》成果展示
图注:论文《Understanding Pixel-Level 2D Image Semantics With 3D Keypoint Knowledge Engine》成果展示
代表性论文:
Qianyu Zhou, Xiangtai Li, Lu He, Yibo Yang, Guangliang Cheng, Yunhai Tong, Lizhuang Ma, Dacheng Tao. TransVOD: End-to-End Video Object Detection with Spatial-Temporal Transformers[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE TPAMI), 2022. https://arxiv.org/abs/2201.05047
Xin Tan, Jiaying Lin, Ke Xu, Pan Chen, Lizhuang Ma, and Rynson Lau. Mirror Detection with the Visual Chirality Cue[J]. EEE Trans. on Pattern Analysis and Machine Intelligence (IEEE TPAMI), 2022. https://ieeexplore.ieee.org/document/9793716
Yang You, Chengkun Li, Yujing Lou, Zhoujun Cheng, Liangwei Li, Lizhuang Ma, Weiming Wang, Cewu Lu: Understanding Pixel-Level 2D Image Semantics With 3D Keypoint Knowledge Engine [J]. IEEE Trans. on Pattern Analysis and Machine Intelligence (IEEE TPAMI), 2022. https://arxiv.org/abs/2111.10817
图注:赵波老师和他的团队开发的轻量级多模态大语言模型——Bunny-3B/4B/8B,来源:https://github.com/BAAI-DCAI/Bunny
图注:论文《Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking》成果展示。来源:https://jiyao06.github.io/Omni6DPose/
代表性论文:
Efficient Multimodal Learning from Data-centric Perspective https://arxiv.org/abs/2402.11530
Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking. https://arxiv.org/pdf/2406.04316
VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval. https://arxiv.org/pdf/2406.04292
Zhenyu Xu, Hailin Xu, Zhouyang Lu, Yingying Zhao, Rui Zhu, Yujiang Wang, Mingzhi Dong, Yuhu Chang, Qin Lv, Robert P. Dick, Fan Yang, Tun Lu, Ning Gu, and Li Shang, “Can Large Language Models Be Good Companions? An LLM-Based Eyewear System with Conversational Common Ground,” ACM Interactive Mobile Wearable & Ubiquitous Technologies (IMWUT), Vol. 8, No. 2, pp. 1-41, June 2024. (Accepted)
Yubin Shi, Yixuan Chen, Mingzhi Dong, Xiaochen Yang, Dongsheng Li, Yujiang Wang, Robert Dick, Qin Lv, Yingying Zhao, Fan Yang, Tun Lu, Ning Gu, and Li Shang, “Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized Models,” in Proceedings Conference on Neural Information Processing Systems (NeurIPS), December 2023.
Maya Okawa, Ekdeep S. Lubana, Robert P. Dick, and Hidenori Tanaka, “Compositional abilities emerge multiplicatively: Exploring diffusion models on a synthetic task,” in Proceedings Conference on Neural Information Processing Systems (NeurIPS), December 2023.
图注:论文《Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation》成果展示
图注:论文《NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning》成果展示
代表性论文:
Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation https://arxiv.org/abs/2407.05890
NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning https://arxiv.org/abs/2403.07376
MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation https://arxiv.org/abs/2401.07314
Surfer: Progressive Reasoning with World Models for Robotic Manipulation https://arxiv.org/abs/2306.11335
图注:来源:论文《CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation》针对视觉语言导航,提出CorNav 框架:利用大型语言模型进行决策,包含两个关键组件。一是纳入环境反馈以改进未来计划并调整行动;二是多个领域专家用于解析指令、理解场景和改进预测行动。
图注:论文《TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts》成果展示。TIP-Editor 框架:接受文本和图像提示以及 3D 包围框来指定编辑区域。图像提示可让用户方便地指定目标内容的详细外观 / 风格,补充文本描述,从而实现对外观的准确控制。
图注:《多模态大模型:新一代人工智能技术范式》本书以深入浅出的方式全面地介绍了多模态大模型的核心技术与典型应用,并围绕新一代人工智能技术范式,详细阐述了因果推理、世界模型、超级智能体与具身智能等前沿技术。希望本书能够为学术界和工业界提供一个清晰的视角,以帮助人工智能科研工作者更全面地了解多模态大模型的技术和新一代人工智能的发展方向。