几篇论文实现代码:
《Long-Form Speech Generation with Spoken Language Models》(2024) GitHub: github.com/google-deepmind/librispeech-long
《DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs》(2024) GitHub: github.com/MengLcool/SliMM
《DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT》(2024) GitHub: github.com/YvanYin/DrivingWorld [fig1]
《WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents》(2024) GitHub: github.com/elated-sawyer/WALL-E [fig2]
《VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks》(2024) GitHub: github.com/OpenMOSS/VLABench
《MINIMA: Modality Invariant Image Matching》(2024) GitHub: github.com/LSXI7/MINIMA [fig3]
《DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery》(2024) GitHub: github.com/DroneSplat/anonymous_code
《DriveMM: All-in-One Large Multimodal Model for Autonomous Driving》(2024) GitHub: github.com/zhijian11/DriveMM [fig4]
《Dense-Face: Personalized Face Generation Model via Dense Annotation Prediction》(2024) GitHub: github.com/CHELSEA234/Dense-Face
《Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models》(2024) GitHub: github.com/KbsdJames/omni-math-rule
《GraphAgent: Agentic Graph Language Assistant》(2024) GitHub: github.com/HKUDS/GraphAgent
《Sound bubbles on hearables》(2024) GitHub: github.com/chentuochao/Sound_Bubble
《ICAL: Continual Learning of Multimodal Agents by Transforming Trajectories into Actionable Insights》(2024) GitHub: github.com/Gabesarch/ICAL
《Long-Form Speech Generation with Spoken Language Models》(2024) GitHub: github.com/google-deepmind/librispeech-long
《DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs》(2024) GitHub: github.com/MengLcool/SliMM
《DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT》(2024) GitHub: github.com/YvanYin/DrivingWorld [fig1]
《WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents》(2024) GitHub: github.com/elated-sawyer/WALL-E [fig2]
《VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks》(2024) GitHub: github.com/OpenMOSS/VLABench
《MINIMA: Modality Invariant Image Matching》(2024) GitHub: github.com/LSXI7/MINIMA [fig3]
《DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery》(2024) GitHub: github.com/DroneSplat/anonymous_code
《DriveMM: All-in-One Large Multimodal Model for Autonomous Driving》(2024) GitHub: github.com/zhijian11/DriveMM [fig4]
《Dense-Face: Personalized Face Generation Model via Dense Annotation Prediction》(2024) GitHub: github.com/CHELSEA234/Dense-Face
《Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models》(2024) GitHub: github.com/KbsdJames/omni-math-rule
《GraphAgent: Agentic Graph Language Assistant》(2024) GitHub: github.com/HKUDS/GraphAgent
《Sound bubbles on hearables》(2024) GitHub: github.com/chentuochao/Sound_Bubble
《ICAL: Continual Learning of Multimodal Agents by Transforming Trajectories into Actionable Insights》(2024) GitHub: github.com/Gabesarch/ICAL