Open Domain Image Generation and Editing with Natural Language Guidance
Link TODO: https://ljvmiranda921.github.io/notebook/2021/08/08/clip-vqgan/#training
Key Part
- For image generation or editing, CLIP can be used to do per-example fine tuning onto a GAN’s latent space.
- Start with text prompt and random noise
- Random noise –[VQGAN encoder]-> latent representation –[VCGAN decoder] -> generate image
- Use clip model to find loss of (generate image, text prompt)
- Backpropagate loss into latent representation (I think this would require setting the
- Minimise CLIP similarity, at threshold, use generated image of
latent representation --[VCGAN decoder] -> generate image
as generated image.
For image editing,