面对视频生成中画面闪烁和变形的痛点,Lena 和 Miles 深入拆解了 Diffusion Transformer 的核心逻辑。通过将 Transformer 架构引入扩散模型,你将理解 AI 如何掌握物理规律,实现从“随机抽卡”到精准执导的技术飞跃。

DiT 彻底抛弃了层层缩放的传统结构,将视频看作一组携带信息的时空序列,利用 Transformer 的全局视野在处理长程一致性时展现出降维打击般的优势。
Creado por exalumnos de la Universidad de Columbia en San Francisco
"Instead of endless scrolling, I just hit play on BeFreed. It saves me so much time."
"I never knew where to start with nonfiction—BeFreed’s book lists turned into podcasts gave me a clear path."
"Perfect balance between learning and entertainment. Finished ‘Thinking, Fast and Slow’ on my commute this week."
"Crazy how much I learned while walking the dog. BeFreed = small habits → big gains."
"Reading used to feel like a chore. Now it’s just part of my lifestyle."
"Feels effortless compared to reading. I’ve finished 6 books this month already."
"BeFreed turned my guilty doomscrolling into something that feels productive and inspiring."
"BeFreed turned my commute into learning time. 20-min podcasts are perfect for finishing books I never had time for."
"BeFreed replaced my podcast queue. Imagine Spotify for books — that’s it. 🙌"
"It is great for me to learn something from the book without reading it."
"The themed book list podcasts help me connect ideas across authors—like a guided audio journey."
"Makes me feel smarter every time before going to work"
Creado por exalumnos de la Universidad de Columbia en San Francisco

Lena: 哎,Miles,你最近看那些 AI 生成的短视频了吗?就是那种镜头感特别强,甚至连光影和水波纹都跟真的一样的。我前两天试着跑了一个,结果发现现在的模型不仅是画质变好了,最关键的是它不再像以前那样,跑着跑着画面就“闪烁”或者人物变形了。
Miles: 你观察得挺准。其实这背后是个挺大的技术转向。以前大家习惯用 U-Net 架构,但现在像 Sora 或者最新的 Helios,核心都换成了 Diffusion Transformer,也就是大家常说的 DiT。
Lena: 真的假的?我听说这个架构现在甚至能让手机本地在几秒钟内就跑出一段视频,完全不用连云端服务器。这 Transformer 不是搞大语言模型的吗,怎么跑去生成视频,反而比以前更稳、更懂物理规律了?
Miles: 确实,这正是最神奇的地方。简单来说,它不再是把视频当成一串独立的图片去修补,而是把整个时空看成一个整体。今天咱们就从这个“时空补丁”的逻辑聊起,看看 DiT 到底是怎么把 Transformer 的大脑装进扩散模型的身体里的。