July 19, 2024

The art of AI art prompts: How to get the most out of AI image generation

Learn the basics of AI image generation and how to craft concise AI art prompts that get you closer to the results you want.
July 19, 2024

The art of AI art prompts: How to get the most out of AI image generation

Learn the basics of AI image generation and how to craft concise AI art prompts that get you closer to the results you want.
July 19, 2024
Braveen Kumar
In this article
Start editing audio & video
This makes the editing process so much faster. I wish I knew about Descript a year ago.
Matt D., Copywriter
Sign up

What type of content do you primarily create?

Videos
Podcasts
Social media clips
Transcriptions
Start editing audio & video
This makes the editing process so much faster. I wish I knew about Descript a year ago.
Matt D., Copywriter
Sign up

What type of content do you primarily create?

Videos
Podcasts
Social media clips
Transcriptions

They say a picture is worth a thousand words, so it’s no wonder text-to-image AI generators are so hard to use: They force you to translate a picture in your mind into a prompt that’s only a handful of lines.

On top of that, the unpredictable nature of these AI models means no prompt will ever generate the exact same image every time.

Unless you learn how to prompt them properly, the randomness that should be a feature of these AI tools can start to feel like a bug.

But with a bit of practice, and this beginner-friendly guide, AI image generation can become a powerful addition to your creative toolbox.

The good, the bad, and the weird-looking hands of AI image generation

Note: While there are several text-to-image AI models—like Midjourney, Stable Diffusion, and Adobe Firefly—we’ll mostly be focusing on DALL-E by OpenAI since it’s the most versatile and widely available option.

Before we get into writing prompts, it’s worth setting expectations around AI image generation: what it’s good for and what it’s bad at.

AI image generators work best when you play to their strengths, which are:

  • Producing multiple variations and visual directions in a matter of seconds
  • Visualizing concepts that would be difficult or expensive to bring to life otherwise
  • Referencing famous artists, brands, and other real world examples
  • Understanding all the creative jargon in different creative fields, from photography to color theory.

Text-to-image AI models are better than your average person—namely, me—at photography, graphic design, 3D modeling, and illustration. But on their own, they still struggle with tasks where precision and attention to detail are required.

Case in point: I’ve tried over a dozen prompts and different tools to generate a simple image of a chess board with all the pieces in their correct starting positions. It always gets at least one glaring detail wrong. Usually several.

examples of AI-generated images of a chessboard with obvious mistakes
Chess board in starting position from top-down view. DALL-E 2024.

Since random generation is at the core of what they do, text-to-image generators aren’t great at revising existing images based on feedback. Ask it to change one detail in an image—like make a T-shirt blue—and it'll often change that and a whole lot more.

In their current state, AI image models also generally fall short at reliably rendering text within images. They have a habit of adding extra letters, like on this birthday cake.

ai-generated image of birthday cake misspelling "happy 1st birthday"
Photo of birthday cake that says, “Happy 1st Birthday”. DALL-E 2024.

Credit where credit’s due though—these models have gotten a lot better at generating hands with the correct number of fingers, something they were notoriously bad at not too long ago.

So a lot of these quirks I just mentioned might get ironed out in future updates.

examples of AI-generated hands that look normal
Pro tip: Always count the fingers ✅

It's not perfect, but there are still plenty of scenarios where the ability to magically type an image into existence can be really handy (hopefully without any extra fingers).

Creative ways to use AI image generation

Since this technology is still young and constantly improving, new use cases are being discovered every day. It's safe to say text-to-image AI is no longer just a mildly amusing party trick.

Here are just a few ideas for how to use this breed of generative AI in your next creative project.

1. Bespoke stock photo substitutes

If you ever find yourself wishing for a specific stock photo that probably doesn’t exist, say “a slow loris with a gambling problem,” AI can generate a decent substitute in a snap—perfect for a throwaway joke in a YouTube video.

example of AI-generated image of a slow loris animal playing poker
Couldn't find anything like this in Descript's stock library. Can you believe that?

2. Backdrops for green screen edits

By combining the AI powers of image generation and video background removal—which your AI Underlord in Descript can do in a couple of clicks—you can drop yourself or a human subject into any scene you can imagine to help you tell your story better.

me against the backdrop of an ai-generated image of a volcano

3. Background images

Another way to harness the randomness of AI to your benefit is by generating simple backgrounds consisting of shapes, colors, and illustrations based on your brand or style.

You can use these to create visually interesting backgrounds you can build upon with text and other graphical elements to create title cards, video templates, podcast audiograms, and more.

ai-generated background containing abstract shapes

4. Cover images for podcast episodes and articles

If you manage a podcast or blog, AI can be a quick and cheap way to produce cover art and inline visuals.

For the best results, give the AI a specific concept to work with. Here’s a first attempt at generating a slot machine as a visual metaphor for the role that luck plays in generating AI images.

ai-generated slot machine image as a metaphor for ai image generation

Prompt-writing best practices to generate better AI art

Ironically, generating AI imagery is kind of an art itself. There’s a lot of trial and error involved, and a fair bit of luck, since even the most prescriptive prompt will generate a different image every time.

So if you don't like the results, you can refine your prompt or pull the lever again to spin the reels until you hit the jackpot.

If you're generating images in Descript, Underlord will come up with multiple variations at a time, letting you pick one as the direction to regenerate more options.

ai-generated image of a robot painting a landscape

Keep it short but specific

AI art prompts are usually around one to six sentences depending on how much creative control you want over the result. AI has no artistic taste of its own so it relies heavily on your instructions. An accidental "s" at the end of a noun can result in many when you only wanted one.

Give too little context and the AI model might take creative liberties where you don’t want it to; too much, and it might prioritize the wrong details in a long list of requirements.

It's kind of like delegating to an intern wearing a blindfold on the first day of the job. Minimize the potential for misinterpretation with explicit instructions—don't trust that it'll fill in the blanks with the correct assumptions.

Jargon is power so brush up on that art theory

One word can make a world of difference in the results you get from any given prompt.

Luckily, AI is fluent in jargon so there are plenty of keywords you can use to efficiently tell it what you want.

If there's one thing that will make you a better prompt writer, it's spending some time brushing up on your theory, from photography to illustration, and working those keywords into your prompt.

For anyone who slept through art class, the Descript blog has you covered:

Let's use a generic prompt like "generate a raccoon" as an example to look at how adding just one keyword across different parameters can dramatically change the result you get.

ai-generated images of a raccoon using various keywords

There are obviously a lot more keywords you can explore across parameters like:

  • Depth of field to control the range of distance that's in focus in a photo (e.g. shallow, deep, bokeh).
  • Mood to control the impression and emotion you want to exude (e.g. melancholy, mysterious, serene, energetic).
  • Lighting to control the intensity and type of light in your photo (e.g. soft, hard, or natural).
  • Composition to control the layout and placement of specific elements (e.g. foreground, background, balanced, asymmetrical).
  • Setting to control the time of day, weather, or location of an image (e.g. sunset, thunderstorm, December in Canada).
  • References to artists, characters, or brands as short-hand for a distinct style (e.g. Van Gogh, Garfield, Pixar).

Use AI for the first draft, but polish the rest in post

More often than not, you'll generate an image that's so close to good enough—if only it didn't have that alien-language text, warped object, or really weird guy in the back.

You can't really get an AI image generator to target and fix these imperfections for you because it'll just go and generate a brand new image.

What you can do is apply some good old-fashioned editing:

  • Covering up any text and replacing it with your own
  • Cropping out weird-looking people or objects
  • Removing the background and dropping your AI-generated subject into a different one

You can do some of these edits in Descript, or use a photo editing tool like Photoshop or Canva, where you can use generative AI to fill, replace, or remove specific areas of an image.

Get ChatGPT to help you write your prompt

Now that ChatGPT can "read" images, you can actually use it to help you write AI image prompts by uploading an existing image you want to use as a reference.

I uploaded an actual photo I took on my trip to Italy and split my own instructions into two parts so I could edit its AI image prompt if I wanted:

Describe this image and then write an AI art prompt that generates a similar image.

real photo of Lake Como in Italy with a swan as the subject

Here's the prompt it spat out:

"Create a serene lakeside scene during early morning with soft, golden sunlight. In the foreground, depict a graceful white swan gliding on calm water with gentle ripples. Ensure the sunlight reflects off the water's surface, creating a shimmering effect. In the background, include several anchored sailboats with tall masts. The far shore should have a line of trees and buildings, with misty mountains visible beyond, adding depth and tranquility to the scene. The overall atmosphere should be peaceful and idyllic."

I gave that prompt as is to Underlord in Descript and this was the result.

ai-generated images based on prompt based on the real photo of a swan swimming in a lake
Not bad, robot.

Make your prompt your canvas

Even as AI image generation improves over time (a safe bet given it's only a few years old), one thing probably won't change: It's really hard to describe a picture in words.

AI still relies heavily on human input, and despite all the talk about AI taking our jobs, it's only at its best when it's in creative hands.

Braveen Kumar
Writer and marketing consultant. Helping creative people with business stuff and businesses with creative stuff.
Share this article
Start creating—for free
Sign up
Join millions of others creating with Descript

The art of AI art prompts: How to get the most out of AI image generation

They say a picture is worth a thousand words, so it’s no wonder text-to-image AI generators are so hard to use: They force you to translate a picture in your mind into a prompt that’s only a handful of lines.

On top of that, the unpredictable nature of these AI models means no prompt will ever generate the exact same image every time.

Unless you learn how to prompt them properly, the randomness that should be a feature of these AI tools can start to feel like a bug.

But with a bit of practice, and this beginner-friendly guide, AI image generation can become a powerful addition to your creative toolbox.

The good, the bad, and the weird-looking hands of AI image generation

Note: While there are several text-to-image AI models—like Midjourney, Stable Diffusion, and Adobe Firefly—we’ll mostly be focusing on DALL-E by OpenAI since it’s the most versatile and widely available option.

Before we get into writing prompts, it’s worth setting expectations around AI image generation: what it’s good for and what it’s bad at.

AI image generators work best when you play to their strengths, which are:

  • Producing multiple variations and visual directions in a matter of seconds
  • Visualizing concepts that would be difficult or expensive to bring to life otherwise
  • Referencing famous artists, brands, and other real world examples
  • Understanding all the creative jargon in different creative fields, from photography to color theory.

Text-to-image AI models are better than your average person—namely, me—at photography, graphic design, 3D modeling, and illustration. But on their own, they still struggle with tasks where precision and attention to detail are required.

Case in point: I’ve tried over a dozen prompts and different tools to generate a simple image of a chess board with all the pieces in their correct starting positions. It always gets at least one glaring detail wrong. Usually several.

examples of AI-generated images of a chessboard with obvious mistakes
Chess board in starting position from top-down view. DALL-E 2024.

Since random generation is at the core of what they do, text-to-image generators aren’t great at revising existing images based on feedback. Ask it to change one detail in an image—like make a T-shirt blue—and it'll often change that and a whole lot more.

In their current state, AI image models also generally fall short at reliably rendering text within images. They have a habit of adding extra letters, like on this birthday cake.

ai-generated image of birthday cake misspelling "happy 1st birthday"
Photo of birthday cake that says, “Happy 1st Birthday”. DALL-E 2024.

Credit where credit’s due though—these models have gotten a lot better at generating hands with the correct number of fingers, something they were notoriously bad at not too long ago.

So a lot of these quirks I just mentioned might get ironed out in future updates.

examples of AI-generated hands that look normal
Pro tip: Always count the fingers ✅

It's not perfect, but there are still plenty of scenarios where the ability to magically type an image into existence can be really handy (hopefully without any extra fingers).

Creative ways to use AI image generation

Since this technology is still young and constantly improving, new use cases are being discovered every day. It's safe to say text-to-image AI is no longer just a mildly amusing party trick.

Here are just a few ideas for how to use this breed of generative AI in your next creative project.

1. Bespoke stock photo substitutes

If you ever find yourself wishing for a specific stock photo that probably doesn’t exist, say “a slow loris with a gambling problem,” AI can generate a decent substitute in a snap—perfect for a throwaway joke in a YouTube video.

example of AI-generated image of a slow loris animal playing poker
Couldn't find anything like this in Descript's stock library. Can you believe that?

2. Backdrops for green screen edits

By combining the AI powers of image generation and video background removal—which your AI Underlord in Descript can do in a couple of clicks—you can drop yourself or a human subject into any scene you can imagine to help you tell your story better.

me against the backdrop of an ai-generated image of a volcano

3. Background images

Another way to harness the randomness of AI to your benefit is by generating simple backgrounds consisting of shapes, colors, and illustrations based on your brand or style.

You can use these to create visually interesting backgrounds you can build upon with text and other graphical elements to create title cards, video templates, podcast audiograms, and more.

ai-generated background containing abstract shapes

4. Cover images for podcast episodes and articles

If you manage a podcast or blog, AI can be a quick and cheap way to produce cover art and inline visuals.

For the best results, give the AI a specific concept to work with. Here’s a first attempt at generating a slot machine as a visual metaphor for the role that luck plays in generating AI images.

ai-generated slot machine image as a metaphor for ai image generation

Prompt-writing best practices to generate better AI art

Ironically, generating AI imagery is kind of an art itself. There’s a lot of trial and error involved, and a fair bit of luck, since even the most prescriptive prompt will generate a different image every time.

So if you don't like the results, you can refine your prompt or pull the lever again to spin the reels until you hit the jackpot.

If you're generating images in Descript, Underlord will come up with multiple variations at a time, letting you pick one as the direction to regenerate more options.

ai-generated image of a robot painting a landscape

Keep it short but specific

AI art prompts are usually around one to six sentences depending on how much creative control you want over the result. AI has no artistic taste of its own so it relies heavily on your instructions. An accidental "s" at the end of a noun can result in many when you only wanted one.

Give too little context and the AI model might take creative liberties where you don’t want it to; too much, and it might prioritize the wrong details in a long list of requirements.

It's kind of like delegating to an intern wearing a blindfold on the first day of the job. Minimize the potential for misinterpretation with explicit instructions—don't trust that it'll fill in the blanks with the correct assumptions.

Jargon is power so brush up on that art theory

One word can make a world of difference in the results you get from any given prompt.

Luckily, AI is fluent in jargon so there are plenty of keywords you can use to efficiently tell it what you want.

If there's one thing that will make you a better prompt writer, it's spending some time brushing up on your theory, from photography to illustration, and working those keywords into your prompt.

For anyone who slept through art class, the Descript blog has you covered:

Let's use a generic prompt like "generate a raccoon" as an example to look at how adding just one keyword across different parameters can dramatically change the result you get.

ai-generated images of a raccoon using various keywords

There are obviously a lot more keywords you can explore across parameters like:

  • Depth of field to control the range of distance that's in focus in a photo (e.g. shallow, deep, bokeh).
  • Mood to control the impression and emotion you want to exude (e.g. melancholy, mysterious, serene, energetic).
  • Lighting to control the intensity and type of light in your photo (e.g. soft, hard, or natural).
  • Composition to control the layout and placement of specific elements (e.g. foreground, background, balanced, asymmetrical).
  • Setting to control the time of day, weather, or location of an image (e.g. sunset, thunderstorm, December in Canada).
  • References to artists, characters, or brands as short-hand for a distinct style (e.g. Van Gogh, Garfield, Pixar).

Use AI for the first draft, but polish the rest in post

More often than not, you'll generate an image that's so close to good enough—if only it didn't have that alien-language text, warped object, or really weird guy in the back.

You can't really get an AI image generator to target and fix these imperfections for you because it'll just go and generate a brand new image.

What you can do is apply some good old-fashioned editing:

  • Covering up any text and replacing it with your own
  • Cropping out weird-looking people or objects
  • Removing the background and dropping your AI-generated subject into a different one

You can do some of these edits in Descript, or use a photo editing tool like Photoshop or Canva, where you can use generative AI to fill, replace, or remove specific areas of an image.

Get ChatGPT to help you write your prompt

Now that ChatGPT can "read" images, you can actually use it to help you write AI image prompts by uploading an existing image you want to use as a reference.

I uploaded an actual photo I took on my trip to Italy and split my own instructions into two parts so I could edit its AI image prompt if I wanted:

Describe this image and then write an AI art prompt that generates a similar image.

real photo of Lake Como in Italy with a swan as the subject

Here's the prompt it spat out:

"Create a serene lakeside scene during early morning with soft, golden sunlight. In the foreground, depict a graceful white swan gliding on calm water with gentle ripples. Ensure the sunlight reflects off the water's surface, creating a shimmering effect. In the background, include several anchored sailboats with tall masts. The far shore should have a line of trees and buildings, with misty mountains visible beyond, adding depth and tranquility to the scene. The overall atmosphere should be peaceful and idyllic."

I gave that prompt as is to Underlord in Descript and this was the result.

ai-generated images based on prompt based on the real photo of a swan swimming in a lake
Not bad, robot.

Make your prompt your canvas

Even as AI image generation improves over time (a safe bet given it's only a few years old), one thing probably won't change: It's really hard to describe a picture in words.

AI still relies heavily on human input, and despite all the talk about AI taking our jobs, it's only at its best when it's in creative hands.

Featured articles:

No items found.

Articles you might find interesting

Podcasting

Dynamic vs. condenser microphones: What’s the difference?

A dynamic microphone uses a magnetic field to generate an electrical signal. A condenser microphone is a type that creates audio signals using a capacitor.

Video

YouTube thumbnail size: 6 key best practices

Your YouTube thumbnail is your first impression, and it all starts with the perfect YouTube thumbnail size. Learn everything about YouTube dimensions in this guide.

How They Made It

Emily Shaw of Candy Ears on the balance between audio passion projects and paychecks

Emily Shaw shares how she stays creative while working for a paycheck, how passion projects help her get work, and why she loves Descript.

Related articles:

Share this article

Get started for free →