专栏名称: 计算机视觉深度学习和自动驾驶
51好读  ›  专栏  ›  计算机视觉深度学习和自动驾驶

暴击DALLE-3!8B参数量的Stable Diffusion 3发布!

计算机视觉深度学习和自动驾驶  · 公众号  ·  · 2024-04-27 01:28


点蓝色字关注 “机器学习算法工程师

设为 星标 ,干货直达!

添加图片注释,不超过 140 字(可选)

这次Stable Diffusion 3最大的亮点应该是基于Diffusion Transformer吧,这巧了不是,OpenAI刚刚发布的文生视频模型Sora也是基于Diffusion Transformer,果真就是Transformer is all you need!

其实SDXL的技术报告也提到了使用Transformer,不过按照当时说法是使用Transformer并没有更好,但是作者坚信调好参数使用更大的Transformer应该会有效的,所以这次Stable Diffusion 3采用Transformer是有前兆的,并不是要凑Sora的热点。

Architecture: During the exploration stage of this work, we briefly experimented with transformer-based architectures such as UViT [16] and DiT [33], but found no immediate benefit. We remain, however, optimistic that a careful hyperparameter study will eventually enable scaling to much larger transformer-dominated architectures.

此外,Stable Diffusion 3的模型参数从800M到8B,最大的模型到了8B,很好地说明了Transformer的scaling能力。最大模型8B,这是不是意味着文生图的模型参数量从此都要跃上一个新的台阶了,普通玩家的门槛要高了。

添加图片注释,不超过 140 字(可选)

从给的例子看, Stable Diffusion 3 的写字能力很强,同时文本prompt的控制更强,直观上感觉可能text encoder用了T5 XXL encoder甚至可能上LLM,而且很可能也是类似DALLE-3做了训练图像的caption优化。

添加图片注释,不超过 140 字(可选)

添加图片注释,不超过 140 字(可选)

不得不说Stable Diffusion 3的文本控制很强,多主体,文字,位置关系,属性都可以生成的很好,可能不输DALLE-3(第一张图是SD3生成,第二张图是基于bing的DALLE-3生成)。

Three transparent glass bottles on a wooden table. The one on the left has red liquid and the number 1. The one in the middle has blue liquid and the number 2. The one on the right has green liquid and the number 3.

添加图片注释,不超过 140 字(可选)

添加图片注释,不超过 140 字(可选)

Photo of a red sphere on top of a blue cube. Behind them is a green triangle, on the right is a dog, on the left is a cat

添加图片注释,不超过 140 字(可选)

添加图片注释,不超过 140 字(可选)

Photo of an 90's desktop computer on a work desk, on the computer screen it says "welcome". On the wall in the background we see beautiful graffiti with the text "SD3" very large on the wall.

添加图片注释,不超过 140 字(可选)

添加图片注释,不超过 140 字(可选)

Resting on the kitchen table is an embroidered cloth with the text 'good night' and an embroidered baby tiger. Next to the cloth there is a lit candle. The lighting is dim and dramatic.

添加图片注释,不超过 140 字(可选)

添加图片注释,不超过 140 字(可选)

Photo of a rectangular orange neon sign with the text "even more stable", the sign is on the wall in a metro station, subway speeding by in the background, perspective photo.

添加图片注释,不超过 140 字(可选)
