diff --git a/_posts/2025-08-18-diff-distill.md b/_posts/2025-08-18-diff-distill.md index 61a4418..38e0745 100644 --- a/_posts/2025-08-18-diff-distill.md +++ b/_posts/2025-08-18-diff-distill.md @@ -38,7 +38,7 @@ Diffusion and flow-based models, [quantization](https://github.com/bitsandbytes-foundation/bitsandbytes), and parameter-efficient fine-tuning. In this blog, we focus on an orthogonal approach named **Ordinary Differential Equation (ODE) distillation**. This method introduces an auxiliary structure that bypasses explicit ODE solving, thereby reducing the Number of Function Evaluations (NFEs). As a result, we can generate high-quality samples with fewer denoising steps. +This challenge has spurred research into acceleration strategies across multiple granular levels, including hardware optimization, mixed precision training, [quantization](https://github.com/bitsandbytes-foundation/bitsandbytes), parameter-efficient fine-tuning, and advanced solver. In this blog, we focus on an orthogonal approach named **Ordinary Differential Equation (ODE) distillation**. This method introduces an auxiliary structure that bypasses explicit ODE solving, thereby reducing the Number of Function Evaluations (NFEs). As a result, we can generate high-quality samples with fewer denoising steps. Distillation, in general, is a technique that transfers knowledge from a complex, high-performance model (the *teacher*) to a more efficient, customized model (the *student*). Recent distillation methods have achieved remarkable reductions in sampling steps, from hundreds to a few and even **one** step, while preserving the sample quality. This advancement paves the way for real-time applications and deployment in resource-constrained environments. @@ -252,6 +252,8 @@ $$ \dv{t}f^\theta_{t \to 0}(\mathbf{x}, t, 0) = 0. $$ +This is intuitive since every point on the same probability flow ODE (\ref{eq:1}) trajectory should be mapped to the same clean data point $$\mathbf{x}_0$$. + By substituting the parameterization of FACM, we have $$\require{physics} @@ -262,9 +264,13 @@ Notice this is equivalent to [MeanFlow](#meanflow) where $$s=0$$. This indicates Training: FACM training algorithm equipped with our flow map notation. Notice that $$d_1, d_2$$ are $\ell_2$ with cosine loss$L_{\cos}(\mathbf{x}, \mathbf{y}) = 1 - \dfrac{\mathbf{x} \cdot \mathbf{y}}{\|\mathbf{x}\|_{2} \, \|\mathbf{y}\|_{2}}$ and norm $\ell_2$ loss$L_{\text{norm}}(\mathbf{x}, \mathbf{y}) =\dfrac{\|\mathbf{x}-\mathbf{y}\|^2}{\sqrt{\|\mathbf{x}-\mathbf{y}\|^2+c}}$ where $c$ is a small constant. This is a special case of adaptive L2 loss proposed in MeanFlow. respectively, plus reweighting. Interestingly, they separate the training of FM and CM on disentangled time intervals. When training with CM target, we let $$s=0, t\in[0,1]$$. On the other hand, we set $$t'=2-t, t'\in[1,2]$$ when training with FM anchors. +
- {% include figure.liquid loading="eager" path="/blog/2025/diff-distill/facm_training.png" class="img-fluid rounded z-depth-1" %} + {% include figure.liquid loading="eager" path="/blog/2025/diff-distill/FACM_training.png" class="img-fluid rounded z-depth-1" %} +
+ The modified training algorithm of FACM. All the notations are adapted to our flow map. +
diff --git a/assets/bibliography/2025-08-18-diff-distill.bib b/assets/bibliography/2025-08-18-diff-distill.bib index 37af97f..334d6bb 100644 --- a/assets/bibliography/2025-08-18-diff-distill.bib +++ b/assets/bibliography/2025-08-18-diff-distill.bib @@ -180,4 +180,13 @@ @article{xu2025one author={Xu, Yilun and Nie, Weili and Vahdat, Arash}, journal={arXiv preprint arXiv:2502.15681}, year={2025} +} + +@article{lu2025dpm, + title={Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models}, + author={Lu, Cheng and Zhou, Yuhao and Bao, Fan and Chen, Jianfei and Li, Chongxuan and Zhu, Jun}, + journal={Machine Intelligence Research}, + pages={1--22}, + year={2025}, + publisher={Springer} } \ No newline at end of file