Class Conditional Diffusion

Introduction

In this repository, we explore different concepts from the current state of the art with respect to Diffusion models for controlled image generation. We train from scratch a custom made Unet to predict noise at a given timestep, and apply it to CelebA-HQ, CIFAR10 & MNIST datasets to generate samples from their corresponding distributions.
🚩 Disclaimer: All faces in this repository are generated (ie, fictional / for educational purposes).

In particular, we follow the paradigm by Ho et al., using $$L_{simple}$$, to regress over the noise with a few twists: The noise schedule is smoother (ie, noise variance follows cosine transitions), similar to Nichol & Dhariwal. The timestep embeddings are sinusoidal (sine-cosine waves) like the positional embeddings described by Vaswani et al., they are applied via a FiLM like mechanism similar to Perez et al.. Their projected parameters are shared across the Unet. For class conditioning, the class representations are either one hot encodings or multi one hot encodings, depending on the dataset, and they are modulated via a cross-attention layer à la Stable Diffusion Rombach & Blattmann et al., at the bottleneck level only. Self-attention is also added at the bottleneck level, as well as at mid-resolution sections. Class conditioning is not randomly dropped in this implementation.

The Unets used for CelebA-HQ $$(256^2\rightarrow16^2)$$ and CIFAR10 $$(32^2\rightarrow2^2)$$ are identical, they are made up of two residual blocks of 4 levels + a bottleneck. Self-attention is placed between the residual blocks. At the bottleneck section, self-attention it is chained with cross-attention. Downsampling is done via strided convolutions. Upsampling path is realized via nearest neighbors followed by a convolution. For MNIST, I used a very simple, and small Unet without any attention. Anything works with MNIST basically!

All results are from EMA checkpoints. For caclulating EMA weights, I am using the ema_pytorch class from lucidrains repo, but it's also available via pip install ema-pytorch. All models were trained with mixed precision, using bfloat16.

General Requirements

Python >= 3.8
PyTorch
Torchvision
Einops
NumPy
Pandas
Matplotlib
Other libraries: PIL, PyYAML
Datasets used: CelebA-HQ $$(1024^2\rightarrow256^2)$$, CIFAR10, MNIST

Installation & Usage

Clone the repository:

git clone https://github.com/ntat/Class-Conditional-Diffusion.git

Install dependencies via pip:
```
pip install -r requirements.txt
```
Make sure you have the datasets and adjust config.yaml
Run the script with python:
```
python main.py
```
For inference, look into the notebooks section to see how to interact with the code.

Results

CIFAR10

Each row corresponds to one of the classes in CIFAR10: airplane, bird, car, cat, deer, dog, frog, horse, ship, & truck.

CelebA-HQ

Conditioning Vector: [Wearing_Lipstick, Young, Attractive, No_Beard]

Conditioning Vector: [Young, Attractive, Male, Smiling]

Conditioning Vector: [Mouth_Slightly_Open, Wearing_Lipstick, Young, Smiling, Attractive, No_Beard]

Conditioning Vector: [Bald, No_Beard, Male] - (very difficult, Bald distribution <2.5%)

Conditioning Vector: [Eyeglasses, Attractive, Young] - (very difficult, Eyeglasses distribution <5%)

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Class Conditional Diffusion

Introduction

General Requirements

Installation & Usage

Results

CIFAR10

CelebA-HQ

Discussion

About

Uh oh!

Releases

Packages

ntat/Class-Conditional-Diffusion

Folders and files

Latest commit

History

Repository files navigation

Class Conditional Diffusion

Introduction

General Requirements

Installation & Usage

Results

CIFAR10

CelebA-HQ

Discussion

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages