Hello, I noticed that in your code you added self.student._diff = self.diff, but does this ensure that the diffusion model parameters are included in the optimizer? From my understanding, it seems that currently only the student model parameters are being updated.