Jianghc's Blog

Back

最近看到一直关注的主播更新了有关DDPM的相关原理,因此来学习一下,简单记录。

DDPM包括三个主要的步骤,前向过程(foward process,或者也可以成为扩散过程diffusion process)、反向过程(reverse process,或者也称为 denoising process)以及采样过程(sampling procedure)。其中前向过程可设为确定的,一般仅仅在训练过程中需要,反向过程是生成过程,一般也是确定的,最重要的是采样过程,是参数训练的重点,主要控制的是减噪声的量。

x0是训练数据,我们期望通过反向扩散后恢复得到x0x_0的可能性是最大的,即最大化lnpθ(x0)ln p_{\theta}(x_0),这里的q是前向的概率分布,p是逆向的概率分布

lnpθ(x0)=q(x1:Tx0)lnpθ(x0)dx1:T=q(x1:Tx0)lnpθ(x0)pθ(x1:Tx0)pθ(x1:Tx0)dx1:T=q(x1:Tx0)lnpθ(x0:T)pθ(x1:Tx0)dx1:T=q(x1:Tx0)lnpθ(x0:T)pθ(x1:Tx0)q(x1:Tx0)q(x1:Tx0)dx1:T=q(x1:Tx0)lnpθ(x0:T)q(x1:Tx0)q(x1:Tx0)pθ(x1:Tx0)dx1:T=q(x1:Tx0)lnpθ(x0:T)q(x1:Tx0)dx1:T+q(x1:Tx0)lnq(x1:Tx0)pθ(x1:Tx0)dx1:T=Eq(x1:Tx0)[lnpθ(x0:T)q(x1:Tx0)]+KL(q(x1:Tx0)pθ(x1:Tx0))\begin{aligned} \ln p_\theta\left(\mathbf{x}_0\right) & =\int q\left(\mathbf{x}_{1: T} \mid \mathbf{x}_0\right) \ln p_\theta\left(\mathbf{x}_0\right) \mathrm{d} \mathbf{x}_{1: T} \\ & =\int q\left(\mathbf{x}_{1: T} \mid \mathbf{x}_0\right) \ln \frac{p_\theta\left(\mathbf{x}_0\right) p_\theta\left(\mathbf{x}_{1: T} \mid \mathbf{x}_0\right)}{p_\theta\left(\mathbf{x}_{1: T} \mid \mathbf{x}_0\right)} \mathrm{d} \mathbf{x}_{1: T} \\ & =\int q\left(\mathbf{x}_{1: T} \mid \mathbf{x}_0\right) \ln \frac{p_\theta\left(\mathbf{x}_{0: T}\right)}{p_\theta\left(\mathbf{x}_{1: T} \mid \mathbf{x}_0\right)} \mathrm{d} \mathbf{x}_{1: T} \\ & =\int q\left(\mathbf{x}_{1: T} \mid \mathbf{x}_0\right) \ln \frac{p_\theta\left(\mathbf{x}_{0: T}\right)}{p_\theta\left(\mathbf{x}_{1: T} \mid \mathbf{x}_0\right)} \frac{q\left(\mathbf{x}_{1: T} \mid \mathbf{x}_0\right)}{q\left(\mathbf{x}_{1: T} \mid \mathbf{x}_0\right)} \mathrm{d} \mathbf{x}_{1: T} \\ & =\int q\left(\mathbf{x}_{1: T} \mid \mathbf{x}_0\right) \ln \frac{p_\theta\left(\mathbf{x}_{0: T}\right)}{q\left(\mathbf{x}_{1: T} \mid \mathbf{x}_0\right)} \frac{q\left(\mathbf{x}_{1: T} \mid \mathbf{x}_0\right)}{p_\theta\left(\mathbf{x}_{1: T} \mid \mathbf{x}_0\right)} \mathrm{d} \mathbf{x}_{1: T} \\ & =\int q\left(\mathbf{x}_{1: T} \mid \mathbf{x}_0\right) \ln \frac{p_\theta\left(\mathbf{x}_{0: T}\right)}{q\left(\mathbf{x}_{1: T} \mid \mathbf{x}_0\right)} \mathrm{d} \mathbf{x}_{1: T}+\int q\left(\mathbf{x}_{1: T} \mid \mathbf{x}_0\right) \ln \frac{q\left(\mathbf{x}_{1: T} \mid \mathbf{x}_0\right)}{p_\theta\left(\mathbf{x}_{1: T} \mid \mathbf{x}_0\right)} \mathrm{d} \mathbf{x}_{1: T} \\ & =\mathrm{E}_{q\left(\mathbf{x}_{1: T} \mid \mathrm{x}_0\right)}\left[\ln \frac{p_\theta\left(\mathbf{x}_{0: T}\right)}{q\left(\mathbf{x}_{1: T} \mid \mathbf{x}_0\right)}\right]+\mathrm{KL}\left(q\left(\mathrm{x}_{1: T} \mid \mathrm{x}_0\right) \| p_\theta\left(\mathrm{x}_{1: T} \mid \mathrm{x}_0\right)\right)\end{aligned}

KL散度是标量,积分是多元积分,

KL散度非负,则最大化lnpθ(x0)ln p_{\theta}(x_0) 等于最大化右侧的下界。右侧是左侧的变分下界 Eq(x1:Tx0)[lnpθ(x0)]Eq(x1:Tx0)[lnpθ(x0:T)q(x1:Tx0)]\mathrm{E}_{q\left(\mathrm{x}_{1: T} \mid \mathrm{x}_0\right)}\left[\ln p_\theta\left(\mathrm{x}_0\right)\right] \geq \mathrm{E}_{q\left(\mathrm{x}_{1: T} \mid \mathrm{x}_0\right)}\left[\ln \frac{p_\theta\left(\mathrm{x}_{0: T}\right)}{q\left(\mathrm{x}_{1: T} \mid \mathrm{x}_0\right)}\right]

神经网络学习的实际上是在时间步t情况下的,反向噪声的一个分布的情况,紧接着从这个分布当中去采样来对原有的图片进行恢复。 52.52s

DDPM背景及原理推导---淡蓝小点
https://525511.xyz/blog/ddpm%E8%83%8C%E6%99%AF%E5%8F%8A%E5%8E%9F%E7%90%86%E6%8E%A8%E5%AF%BC-%E6%B7%A1%E8%93%9D%E5%B0%8F%E7%82%B9
Author Haochen Jiang
Published at April 5, 2024
Comment seems to stuck. Try to refresh?✨