OpenAI 的首个视频生成模型 Sora,让「一句话生成视频」的前沿 AI 技术向上突破了一大截,引发了业界对于生成式 AI 技术方向的大讨论。
OpenAI 探索了视频数据生成模型的大规模训练。具体来说,研究人员在可变持续时间、分辨率和宽高比的视频和图像上联合训练了一个文本条件扩散模型。作者利用对视频和图像潜在代码的时空补丁进行操作的 transformer 架构,其最大的模型 Sora 能够生成长达一分钟的高质量视频。
OpenAI 认为,新展示的结果表明,扩展视频生成模型是构建物理世界通用模拟器的一条有前途的途径。
-
Unsupervised Learning of Video Representations using LSTMs
Paper
•
1502.04681
•Published
-
Recurrent Environment Simulators
Paper
•
1704.02254
•Published
-
Paper
•
1803.10122
•Published
-
Generating Videos with Scene Dynamics
Paper
•
1609.02612
•Published
-
MoCoGAN: Decomposing Motion and Content for Video Generation
Paper
•
1707.04993
•Published
-
Adversarial Video Generation on Complex Datasets
Paper
•
1907.06571
•Published
-
Generating Long Videos of Dynamic Scenes
Paper
•
2206.03429
•Published
-
VideoGPT: Video Generation using VQ-VAE and Transformers
Paper
•
2104.10157
•Published
•
2
-
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Paper
•
2111.12417
•Published
-
Imagen Video: High Definition Video Generation with Diffusion Models
Paper
•
2210.02303
•Published
-
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
Paper
•
2304.08818
•Published
•
4
-
Photorealistic Video Generation with Diffusion Models
Paper
•
2312.06662
•Published
•
20
-
Language Models are Few-Shot Learners
Paper
•
2005.14165
•Published
•
7
-
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Paper
•
2010.11929
•Published
•
2
-
ViViT: A Video Vision Transformer
Paper
•
2103.15691
•Published
-
Masked Autoencoders Are Scalable Vision Learners
Paper
•
2111.06377
•Published
•
1
-
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Paper
•
2307.06304
•Published
•
23
-
High-Resolution Image Synthesis with Latent Diffusion Models
Paper
•
2112.10752
•Published
•
5
-
Auto-Encoding Variational Bayes
Paper
•
1312.6114
•Published
-
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Paper
•
1503.03585
•Published
-
Denoising Diffusion Probabilistic Models
Paper
•
2006.11239
•Published
•
1
-
Improved Denoising Diffusion Probabilistic Models
Paper
•
2102.09672
•Published
•
1
-
Diffusion Models Beat GANs on Image Synthesis
Paper
•
2105.05233
•Published
-
Elucidating the Design Space of Diffusion-Based Generative Models
Paper
•
2206.00364
•Published
•
1
-
Scalable Diffusion Models with Transformers
Paper
•
2212.09748
•Published
•
4
-
Zero-Shot Text-to-Image Generation