专栏名称: 自动驾驶之心

自动驾驶开发者社区，关注计算机视觉、多维感知融合、部署落地、定位规控、领域方案等，坚持为领域输出最前沿的技术方向！

零基础入门自动驾驶多模态大模型~

自动驾驶之心 · 公众号 · · 2025-01-03 07:30

正文

点击下方卡片，关注“ 自动驾驶之心 ”公众号

>> 点击进入→ 自动驾驶之心 『大模型』 技术交流群

大模型上车无疑是今年自动驾驶量产和预研的主旋律之一，24年12月小米也宣布大模型已经OTA更新。那么自动驾驶多模态大模型当下发展如何？又有哪些可以借鉴的工作。今天自动驾驶之心就和大家一起盘点下综述和相关工作，所有资料已汇总至『自动驾驶之心知识星球』，欢迎加入获取~

一、awesome和综述汇总

1、智能交通和自动驾驶中的LLM

https://github.com/ge25nab/Awesome-VLM-AD-ITS

2、AIGC和LLM

https://github.com/coderonion/awesome-llm-and-aigc

3、视觉语言模型综述

https://github.com/jingyi0000/VLM_survey

4、用于CLIP等视觉语言模型的出色提示/适配器学习方法

https://github.com/zhengli97/Awesome-Prompt-Adapter-Learning-for-VLMs

5、LLM/VLM 推理论文列表，并附有代码

https://github.com/DefTruth/Awesome-LLM-Inference

6、大型模型安全、安保和隐私的阅读清单（包括Awesome LLM security、safety等）

https://github.com/ThuCCSLab/Awesome-LM-SSP

7、关于单/多智能体、机器人、llm/vlm/vla、科学发现等的知识库

https://github.com/weleen/awesome-agent

8、关于Embodied AI和相关研究/行业驱动资源的精选论文列表

https://github.com/haoranD/Awesome-Embodied-AI

9、一份精心策划的推理策略和算法列表，可提高视觉语言模型（VLM）的性能

https://github.com/Patchwork53/awesome-vlm-inference-strategies

10、著名的视觉语言模型及其架构

https://github.com/gokayfem/awesome-vlm-architectures

二、视觉语言模型（VLM）基础理论

1、预训练

[arXiv 2024] RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness Paper(https://github.com/RLHF-V/RLAIF-V)

[CVPR 2024] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback Paper(https://github.com/RLHF-V/RLHF-V)

[CVPR 2024] Do Vision and Language Encoders Represent the World Similarly? Paper(https://github.com/mayug/0-shot-llm-vision)

[CVPR 2024] Efficient Vision-Language Pre-training by Cluster Masking Paper(https://github.com/Zi-hao-Wei/Efficient-Vision-Language-Pre-training-by-Cluster-Masking)

[CVPR 2024] Non-autoregressive Sequence-to-Sequence Vision-Language Models [Paper]

[CVPR 2024] ViTamin: Designing Scalable Vision Models in the Vision-Language Era Paper(https://github.com/Beckschen/ViTamin)

[CVPR 2024] Iterated Learning Improves Compositionality in Large Vision-Language Models [Paper]

[CVPR 2024] FairCLIP: Harnessing Fairness in Vision-Language Learning Paper(https://ophai.hms.harvard.edu/datasets/fairvlmed10k)

[CVPR 2024] InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks Paper(https://github.com/OpenGVLab/InternVL)

[CVPR 2024] VILA: On Pre-training for Visual Language Models [Paper]

[CVPR 2024] Generative Region-Language Pretraining for Open-Ended Object Detection Paper(https://github.com/FoundationVision/GenerateU)]

[CVPR 2024] Enhancing Vision-Language Pre-training with Rich Supervisions [Paper]

[ICLR 2024] Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization Paper(https://github.com/jy0205/LaVIT)

[ICLR 2024] MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning Paper(https://github.com/PKUnlp-icler/MIC)

[ICLR 2024] Retrieval-Enhanced Contrastive Vision-Text Models [Paper]

[arXiv 2024] CLIPS: An Enhanced CLIP Framework for Learning with Synthetic Captions Paper(https://github.com/UCSC-VLAA/CLIPS)]

2、迁移学习方法

[NeurIPS 2024] Historical Test-time Prompt Tuning for Vision Foundation Models [Paper]

[NeurIPS 2024] AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation Paper(https://github.com/MCG-NJU/AWT)

零基础入门自动驾驶多模态大模型~

正文

一、awesome和综述汇总

二、视觉语言模型（VLM）基础理论

1、预训练

2、迁移学习方法

请到「今天看啥」查看全文