Open Domain Image Generation and Editing with Natural Language Guidance

Link TODO: https://ljvmiranda921.github.io/notebook/2021/08/08/clip-vqgan/#training

Key Part

For image generation or editing, CLIP can be used to do per-example fine tuning onto a GAN’s latent space.

Start with text prompt and random noise

Random noise –[VQGAN encoder]-> latent representation –[VCGAN decoder] -> generate image

Use clip model to find loss of (generate image, text prompt)

Backpropagate loss into latent representation (I think this would require setting the

Minimise CLIP similarity, at threshold, use generated image of latent representation --[VCGAN decoder] -> generate image as generated image.

For image editing,