| paper | openreview | code | poster, slide, video |
This repository contains the official PyTorch implementation of "Diffusion Adaptive Text Embedding for Text-to-Image Diffusion Models" in NeurIPS 2025.
Byeonghu Na, Minsang Park, Gyuwon Sim, Donghyeok Shin, HeeSun Bae, Mina Kang, Se Jung Kwon, Wanmo Kang, and Il-Chul Moon
KAIST, NAVER Cloud, summary.ai
Diffusion Adaptive Text Embedding (DATE) is a test-time method that dynamically updates text embeddings during diffusion sampling process.
We utilized CUDA 11.4 and Python 3.8.
pip install -r requirements.txt
DATE performs gradient-based updates of text embeddings at selected diffusion timesteps. The update frequency and magnitude are controlled at inference time.
- Text-conditioned evaluation function: CLIP score
python generate_date_clip.py \
--steps=50 --scheduler=DDIM --save_path=gen_images --bs=4 --w=8 --skip_freq=10 \
--num_upt_prompt=1 --lr_upt_prompt=0.5 --name=TEST
- Text-conditioned evaluation function: ImageReward
python generate_date_ir.py \
--steps=50 --scheduler=DDIM --save_path=gen_images --bs=4 --w=8 --skip_freq=10 \
--num_upt_prompt=1 --lr_upt_prompt=0.5 --name=TEST
-
--steps: number of diffusion sampling steps -
--scheduler: diffusion sampler (e.g., DDIM) -
--bs: batch size -
--w: classifier-free guidance scale -
--skip_freq: frequency of text embedding updates -
--num_upt_prompt: number of gradient steps per update -
--lr_upt_prompt: update scale ($\rho$ )
- Zero-shot FID
- Download
coco.npzat this link.
- Download
python evaluation/fid.py <directory_of_generated_images> <path_of_coco_npz_file>
- CLIP score
python evaluation/clip_score.py --text_path=subset.csv --img_path=<directory_of_generated_images>
- ImageReward
python evaluation/image_reward.py --text_path=subset.csv --img_path=<directory_of_generated_images>
This codebase builds upon and is inspired by:
-
Diffusers: https://github.com/huggingface/diffusers
-
Restart: https://github.com/Newbeeer/diffusion_restart_sampling
-
ImageReward: https://github.com/THUDM/ImageReward
@inproceedings{
na2025diffusion,
title={Diffusion Adaptive Text Embedding for Text-to-Image Diffusion Models},
author={Byeonghu Na and Minsang Park and Gyuwon Sim and Donghyeok Shin and HeeSun Bae and Mina Kang and Se Jung Kwon and Wanmo Kang and Il-chul Moon},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=cHi8QxGrZH}
}