🧨 diffusers has supported LoRA adapter training & inference for a while now. We've made multiple QoL improvements to our LoRA API. So, training LoRAs and performing inference with them should now be much more robust.
We're excited to release exp. version of Flux Control fine-tuning scripts.
Flux Control from @BlackForestLabs is, by far, the strongest alternative to ControlNets while being computationally far more efficient.
The idea of Flux Control is simple yet elegant:
1. Instead of maintaining a separate auxiliary module like ControlNet or T2I Adapter, increase the number of input channels for the image latents in the pretrained Flux DiT.
2. Compute the latents of the structural inputs (depth map, for example) with the VAE and concatenate it with the actual latents you started the denoising process with.
3. During training, only the original image latents are noised and the structural latents are then concatenated to it before it's fed to the denoiser.
4. We start from the pretrained T2I Flux DiT along with the additional channels and train further on a similar ControlNet-like dataset!
So, no auxiliary models are trained here.
During inference, only a single model is invoked in the iterative denoising process (typically as opposed to the auxiliary module and the denoiser, as seen in ControlNets).
Custom Diffusion only fine-tunes the cross-attention layers of the UNet and also supports blending textual inversion for seamless learning on consumer hardware.
As a result, with just 250 steps, we can get pretty good results depending on the underlying new subjects.
Since we train only a limited set of layers, WITHOUT using any adapters like LoRA, the resultant parameters total to only ~300 MBs.