Over the past few months, you may have encountered online posts from DALL-E Mini, whether you realized it or not. They would have featured a grid of nine images accompanied by a caption, like “the Demogorgon from Stranger Things holding a basketball.” Powered by artificial intelligence (AI), this tool generates images based on user prompts and is open source, meaning anyone can use and modify it. DALL-E Mini was created by developer Boris Dayma; this is Dayma’s own version of DALL-E, a product developed in 2021 by the AI research company, OpenAI.
Dayma’s model is impressive—the Demogorgon motif is a great example of how accurately the tool can bring our imaginations to life. Some examples, on the other hand, show the program’s limitations. At first glance, the images might look recognizable, but up-close, similar to a Monet painting, shapes and colors are more vague than familiar. The more disturbing results are being archived by the Weird DALL-E Mini Generations Twitter account (which already has over one million followers), like this nightmarish depiction of Jack Black at a Presidential Inauguration.
DALL-E Mini images are lower quality than Open AI’s DALL-E. Plus, DALL-E Mini was created with smaller hardware resources and requires less GPU (graphics processing unit) resources, which is why it can be used by anyone across the internet with a few minutes to spare. Because of its limitations, the images aren’t yet realistic enough to be misinterpreted as real.
This month, OpenAI released its consumer-facing product, DALL-E 2, and is releasing access in stages to the 1 million people on its waiting list. In this model, just like with DALL-E Mini, images are generated from natural language. Users can input virtually any phrase and generate a realistic, high-resolution image. DALL-E 2 is trained on a program called CLIP, which gathers caption and image pairings from across the internet. In addition to the original images and artwork it creates, it can also edit existing images, and spit out variations of the same image. The possibilities are limited only by our imaginations and patience—see some examples on OpenAI’s site.
If you’re interested in learning exactly how it works, this video from Assembly AI is a good explainer.
Because DALL-E 2 and DALL-E Mini are trained on images and captions from across the internet, some results require a warning. Underneath the DALL-E Mini generator, Dayma includes that the images may “… reinforce or exacerbate societal biases” and “…given the fact that the model was trained on unfiltered data from the internet, it may generate images that contain stereotypes against minority groups.”
For example, as documented by Futurism, the term “gastroenterologist” generated images of white male doctors, while the term “nurse” generated images of women. Some prompts also generated oversexualized depictions of women and images that reinforce racial stereotypes, according to Wired.
Similar to DALL-E Mini’s disclaimer, OpenAI’s website says of DALL-E:
“We recognize that work involving generative models has the potential for significant, broad societal impacts. In the future, we plan to analyze how models like DALL·E relate to societal issues like economic impact on certain work processes and professions, the potential for bias in the model outputs, and the longer-term ethical challenges implied by this technology.”
OpenAI has a content policy in place, which prohibits the sharing of explicit or offensive images, and is keeping an eye out for users who violate these policies. Recently, OpenAI deployed a system-level technique for DALL-E 2 to make the image generation more reflective of the diversity of the population.
But what are the other implications of this high-functioning, user-accessible text-to-image generator?
Also in their Content Policy, OpenAI tells users that when sharing DALL-E-generated images, they must indicate that the image or graphic was created by AI. Though the images are often more artistic than photorealistic, the potential for creating false images and spreading misinformation through these images exists, especially as text-to-image technology advances rapidly. Of course, programs like Photoshop also exist, but with DALL-E, there is no program, learning curve, or much time required to generate a convincing image.
Additionally, we’re left to wonder if threat actors can replicate this technology and use it in ways outside of its original intent.
Technology like deepfake is currently used in disinformation campaigns across the internet, though it has its limitations. TheConversation.com tells internet users to look for telltale signs of a deepfake: disjointed mouth movements, overly smooth facial features, misplaced shadows, and fake-looking hair. The site also encourages users to pay attention to the context of a video and think critically about what they’re watching, especially if it seems improbable. There are also a number of deepfake detection technologies in the works.
The same kind of litmus test can be applied to AI-generated images. If a photo is making the rounds on Twitter rather than a verified news source and depicting an unlikely scenario, it deserves a closer look. OpenAI is also working to limit potential misuse. Their policy states:
“To minimize the risk of DALL-E being misused to create deceptive content, we reject image uploads containing realistic faces and attempts to create the likeness of public figures, including celebrities and prominent political figures. We also used advanced techniques to prevent photorealistic generations of real individuals’ faces.”
The paper describing how DALL-E 2 was built is publicly available, meaning that DALL-E could be replicated without the guardrails that OpenAI has put in place to limit disinformation and harmful images.
For now, it’s a fascinating tool that shows the potential for AI technology, and something to marvel at. Just remember not to believe everything you see on the internet.
Learn the 6 cybersecurity basics everyone should have down.