几篇论文实现代码: #AI创造营#
《The GAN is dead; long live the GAN! A Modern Baseline GAN》(NeurIPS 2024) GitHub: github.com/brownvc/R3GAN [fig2]
《ImOV3D: Learning Open-Vocabulary Point Clouds 3D Object Detection from Only 2D Images》(NeurIPS 2024) GitHub: github.com/yangtiming/ImOV3D [fig6]
《Keypoint Promptable Re-Identification》(ECCV 2024) GitHub: github.com/VlSomers/keypoint_promptable_reidentification [fig9]
《Byte Latent Transformer: Patches Scale Better Than Tokens》(2024) GitHub: github.com/facebookresearch/blt [fig1]
《Doe-1: Closed-Loop Autonomous Driving with Large World Model》(2024) GitHub: github.com/wzzheng/Doe [fig3]
《HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors》(2024) GitHub: github.com/humansplat/humansplat [fig4]
《PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings》(2024) GitHub: github.com/joonaskalda/PixIT [fig5]
《LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations》(2024) GitHub: github.com/mengcye/LAION-SG
《EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM》(2024) GitHub: github.com/TempleX98/EasyRef
《Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models》(2024) GitHub: github.com/facebookresearch/metamotivo
《Large Concept Models: Language Modeling in a Sentence Representation Space》(2024) GitHub: github.com/facebookresearch/large_concept_model
《Doppelgangers++: Improved Visual Disambiguation with Geometric 3D Features》(2024) GitHub: github.com/doppelgangers25/doppelgangers-plusplus
《StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements》(2024) GitHub: github.com/Westlake-AGI-Lab/StyleStudio [fig7]
《Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to See》(2024) GitHub: github.com/ZhangAIPI/YOPO_MLLM_Pruning [fig8]
《HunyuanVideo: A Systematic Framework For Large Video Generative Models》(2024) GitHub: github.com/kohya-ss/HunyuanVideo
《Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial Attention》(2024) GitHub: github.com/haomengz/D-LISA
《Real-SRGD: Enhancing Real-World Image Super-Resolution with Classifier-Free Guided Diffusion》(2024) GitHub: github.com/yahoojapan/srgd
《Improving Source Extraction with Diffusion and Consistency Models》(2024) GitHub: github.com/Russell-Izadi-Bose/DiCoSe
《PIG: Physics-Informed Gaussians as Adaptive Parametric Mesh Representations》(2024) GitHub: github.com/NamGyuKang/Physics-Informed-Gaussians
《The GAN is dead; long live the GAN! A Modern Baseline GAN》(NeurIPS 2024) GitHub: github.com/brownvc/R3GAN [fig2]
《ImOV3D: Learning Open-Vocabulary Point Clouds 3D Object Detection from Only 2D Images》(NeurIPS 2024) GitHub: github.com/yangtiming/ImOV3D [fig6]
《Keypoint Promptable Re-Identification》(ECCV 2024) GitHub: github.com/VlSomers/keypoint_promptable_reidentification [fig9]
《Byte Latent Transformer: Patches Scale Better Than Tokens》(2024) GitHub: github.com/facebookresearch/blt [fig1]
《Doe-1: Closed-Loop Autonomous Driving with Large World Model》(2024) GitHub: github.com/wzzheng/Doe [fig3]
《HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors》(2024) GitHub: github.com/humansplat/humansplat [fig4]
《PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings》(2024) GitHub: github.com/joonaskalda/PixIT [fig5]
《LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations》(2024) GitHub: github.com/mengcye/LAION-SG
《EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM》(2024) GitHub: github.com/TempleX98/EasyRef
《Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models》(2024) GitHub: github.com/facebookresearch/metamotivo
《Large Concept Models: Language Modeling in a Sentence Representation Space》(2024) GitHub: github.com/facebookresearch/large_concept_model
《Doppelgangers++: Improved Visual Disambiguation with Geometric 3D Features》(2024) GitHub: github.com/doppelgangers25/doppelgangers-plusplus
《StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements》(2024) GitHub: github.com/Westlake-AGI-Lab/StyleStudio [fig7]
《Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to See》(2024) GitHub: github.com/ZhangAIPI/YOPO_MLLM_Pruning [fig8]
《HunyuanVideo: A Systematic Framework For Large Video Generative Models》(2024) GitHub: github.com/kohya-ss/HunyuanVideo
《Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial Attention》(2024) GitHub: github.com/haomengz/D-LISA
《Real-SRGD: Enhancing Real-World Image Super-Resolution with Classifier-Free Guided Diffusion》(2024) GitHub: github.com/yahoojapan/srgd
《Improving Source Extraction with Diffusion and Consistency Models》(2024) GitHub: github.com/Russell-Izadi-Bose/DiCoSe
《PIG: Physics-Informed Gaussians as Adaptive Parametric Mesh Representations》(2024) GitHub: github.com/NamGyuKang/Physics-Informed-Gaussians