几篇论文实现代码:
《HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud》(CVPR 2024) GitHub: github.com/cwc1260/HandDiff
《Aria-UI: Visual Grounding for GUI Instructions》(2024) GitHub: github.com/AriaUI/Aria-UI [fig1]
《SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models》(2024) GitHub: github.com/thu-coai/SPaR [fig2]
《Stereocrafter: Diffusion-based generation of long and high-fidelity stereoscopic 3d from monocular videos》(2024) GitHub: github.com/TencentARC/StereoCrafter
《YuLan-Mini: An Open Data-efficient Language Model》(2024) GitHub: github.com/RUC-GSAI/YuLan-Mini
《ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation》(2024) GitHub: github.com/Ali2500/ViCaS
《HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud》(CVPR 2024) GitHub: github.com/cwc1260/HandDiff
《Aria-UI: Visual Grounding for GUI Instructions》(2024) GitHub: github.com/AriaUI/Aria-UI [fig1]
《SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models》(2024) GitHub: github.com/thu-coai/SPaR [fig2]
《Stereocrafter: Diffusion-based generation of long and high-fidelity stereoscopic 3d from monocular videos》(2024) GitHub: github.com/TencentARC/StereoCrafter
《YuLan-Mini: An Open Data-efficient Language Model》(2024) GitHub: github.com/RUC-GSAI/YuLan-Mini
《ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation》(2024) GitHub: github.com/Ali2500/ViCaS