What are Stable Diffusion and Disco Diffusion? How to use those tools for one’s own projects? What are their contributions, or disruptions, to the artificial intelligence (AI) generated art, human creativity, or even more ambitiously, human society?
In this post, we compile a host of resources, guided by these questions, to get you started with using the state-of-the-art tools for your own work, while keeping the broader issues associated with AI art in mind. Our focus is text-to-image models, for the moment.
What are Stable Diffusion and Disco Diffusion? How do they work?
Stable Diffusion and Disco Diffusion are two pieces of open source software that generate images in response to a language prompt. A prompt, which is a series of words, is a key to tuning the tools to translate your thoughts into images. We will return to this trick later.
Both Stable Diffusion and Disco Diffusion are based on diffusion models. Stable Diffusion uses a latent diffusion model, and Disco Diffusion is a CLIP-guided diffusion model. These machine learning systems iteratively add random noise to data and learn to reverse the process to remove noise to construct a desired final sample, such as an image.
One of the main components of the two models is the text encoder, built upon CLIP (Contrastive Language-Image Pre-Training). CLIP is a neural network for image classification or image-text similarity tasks that learns visual concepts from natural language.
Here at the Library, the two installations of Stable Diffusion and Disco Diffusion with user interfaces using local GPUs aim to make the creation process more accessible and more efficient.
How to use Stable Diffusion and Disco Diffusion?
To learn to talk to these text-to-image models, we first start with “prompt engineering”. A helpful analogy to understand this process is to think of a prompt as a search query just like how we use the Google search engine. Similarly, we give these image generators a search query to search among a structured representation of all the images it was trained on. We then evaluate the result, and refine the input text until the output image is the closest to our expectations among all possible outputs.
In addition to prompt engineering, there are more practical tips for working with Stable Diffusion. For Disco Diffusion, the most current documentation on its settings can be found in this unofficial guidebook, or this mirrored cheatsheet. Additionally, this guide A Traveler’s Guide to the Latent Space summarizes various experiments with Disco Diffusion and their results.
Such text-to-image tools can be useful in many ways. Not surprisingly, creative professionals such as game designers, interior designers, filmmakers, and advertising agency executives have started to explore applications of these generative AI technologies in their work. Moreover, AI image generators may also be utilized as a form of therapy, or to serve people with aphantasia.
What else to keep an eye on?
Feed a text message to a machine and we get an image output that speaks our mind. This all sounds too good. Anything to watch out for?
For one thing, Stable Diffusion can generate recognizable celebrities, nudity, trademarked characters, or any combination of those. Besides these explicit content, other known issues, more generally attached to generative AI art and not just Stable Diffusion, includes biases in the training data that reinforces stereotypes or underrepresents a group, intentionally generating images to mislead or misinform subjects, modifying images of people that may be harmful, impact on trust in information systems when AI generated content is challenging to label, trademarked logos and copyrighted characters, among many others.
Other ethical concerns especially regarding human creativity and labor arise. How technology is challenging our notions of what art is? Can AI replace human artists? Is this the end of human creativity? As discussed in this article: Is it ethical to train an AI on a huge corpus of copyrighted creative work, without permission or attribution? Is it ethical to allow people to generate new work in the styles of the photographers, illustrators, and designers without compensating them? Is it ethical to charge money for that service, built on the work of others? Or is the fear that AI could displace creatives such as illustrators and photographers is unfounded for now, and AI is seen less as a replacement of creative workers than an artistic assistant or muse?
We hope these resources serve as a starting point for both the practical side of your creative work and a critical inquiry into the subject. There are other, potentially many, unexamined aspects of generative AI art missing in this post. For instance, where is generative AI art through the lens of art history? What are the commercial use cases of these technologies and their implications for human creativity and the wider society? This is where we hope to hear from you more. Does generative AI art spark joy? Or raise the alarm? Share your artwork and your thoughts with us! We look forward to more conversations.
Yun Dai ([email protected])
Data Services | NYU Shanghai Library
November 2022