Varga, Dániel (Rényi) 

A survey of recent advances in image synthesis

Thanks to recent advances in multimodal (text+image) deep learning, we can now create compelling, original images by giving textual prompts to "AI artists". The main components of such systems are:

- An artificial neural network capable of generating images by mapping real vectors to image output. (For those familiar with deep learning concepts: often, but not necessarily a Generative Adversarial Network.)

- An artificial neural network quantifying the relatedness of a text and an image.

- A gradient descent based optimization algorithm using the above two networks as "differentiable subroutines" to gradually create an image corresponding to the text prompt.

The talk will give a broad overview of these components, introducing the core mathematical and engineering concepts and ideas that made them possible.

This set of technologies is commonly shared in the form of Colab notebooks, meaning it is accessible for anyone with a networked computer, even those without programming skills. This enabled a thriving community of artists and amateur programmers to advance the field of image synthesis. The talk will also present a sample of the output of this community.

 

The talk is held in Hungarian!

Az előadás nyelve magyar!

Date: Nov 23, Tuesday 4:15pm

Place: BME, Building „Q”, Room QBF13

Homepage of the Seminar