Introduction

I recently created a custom AI chatbot to serve as an interactive image creation wizard – I call it VERSATILE IMAGE CREATION WIZARD. The idea behind my wizard is to guide users through the visual generation process within their chosen genre and style. I built this tool using OpenAI’s GPT Builder (available to Plus subscribers), an AI platform that enables you to build your own specialised wizards and assistants.

Leveraging GPT Builder, I was able to define prompts and flows that walk users through making key creative decisions to manifest the image design they envision. My wizard prompts for details across a series of tailored steps like establishing the setting, character attributes, stylistic direction, symbolic elements, color schemes, and lighting – capturing the user’s vision along the way.

Finally, the wizard processes those descriptions and creates a completely unique image using the built-in Dall-E which brings the user’s imaginative scene to life! For this blog post I conjured up an epic sci-fi spaceship vision.

Building the image phase by phase

It all starts with the user asking for help, instructions or inspiration. The wizard then presents a predefined set of categories the user can choose from. I defined both the list of 11 categories and each tailored sublist of attributes with the help of the wizard.

For this blog post example I picked the category “Fantasy and Science Fiction” for my image from the list. Upon which the wizard chose the correct sublist and offered me a preview of all the phases for this specific category:

  1. Theme and Genre Definition: Define the focus as fantasy, science fiction, or a combination of both
  2. World-Building Elements: Consider essential elements like alien landscapes or magical realms.
  3. Character Design (if applicable): Define the appearance and attributes of characters within this world.
  4. Style and Mood: Decide on the overall style (e.g., dark, whimsical) and the mood you want to convey.
  5. Colour Palette: Choose a colour scheme that complements the style and mood of the setting.
  6. Symbolism and Imagery: Select genre-specific imagery and symbolic elements.
  7. Technology and Magic Elements: Integrate magical or advanced technological elements.
  8. Lighting and Atmosphere: Use lighting and atmospheric effects to enhance the genre feel.
  9. Legality and Originality: Ensure originality and avoid infringing on existing properties.
  10. Feedback and Iteration: Gather feedback and iterate on the design.

The idea of showing the full list of properties first is to prepare the user for the upcoming step-by-step defining of those very elements for the image. I then proceeded to enter what I envisioned the image to look like. I had the idea of creating a cockpit of a spaceship and so for instance I requested a large transparent central screen, clean interior lighting with ultra-modern neon accents. The wizard summarised the scene back to me to make sure we were on the same page before generating the image. The final step before actually creating the image the wizard is set to ask for the image’s side ratio, I chose 16:9 which is landscape format.

Image Creation Wizard - Screenshot 2023-11-30 at 12.11.43
The Image Creation Wizard summarises my choices for the properties of the image (see above screenshot).

The first version

DALL·E wizard test image 1 - A futuristic spaceship cockpit interior blending fantasy and science fiction themes.

Wizard: Here is the image based on your specifications: a futuristic spaceship cockpit interior that beautifully blends fantasy and science fiction themes, featuring a large, hovering, transparent screen in the center, surrounded by neon-colored accents in an ultra-modern and sleek design. The anthropomorphic AI entity adds a unique touch to the scene. The lighting and spacious atmosphere enhance the stylish, modern, and futuristic mood of the cockpit.

Please take a look at the image and let me know if there’s anything you’d like to adjust or if it matches your vision!

Me: fantastic! Here are my amendments. Since it’s not really obvious that we’re looking at a cockpit I think we could have a slightly larger window in the centre background through we could see space. In that space we see a planet with rings. So we could zoom in a bit so the window appears larger. Furthermore we could move the ai entity to one side (and make sure it does not have legs!).

Wizard: The image has been updated based on your amendments. It now includes a larger window in the center background, offering a view of space and a planet with rings, enhancing the cockpit feel. The anthropomorphic AI entity has been moved to one side and designed without legs, maintaining the sleek and simple design. (…)

Me: we’re getting closer. now bring back the see-through screen and move it to one side with the robot next to it, remember, it hovers! The window could be larger so we see more of the space

DALL·E wizard test image 2 - Revise the futuristic spaceship cockpit interior, making it more apparent that it is a cockpit.
DALL·E wizard test image 3 - Update the futuristic spaceship cockpit interior once more.

Wizard: The image has been revised again according to your specifications. The large, hovering, transparent screen is now positioned to one side of the cockpit, with the anthropomorphic AI entity (without legs) next to it, emphasizing its hovering aspect. The window has been enlarged to offer a more expansive view of outer space, including the planet with rings. (…)

Me: thanks, now the cockpit is done, we keep this one. Now can you recreate this with the Sun seen through it with some protuberances visible?

Wizard: The spaceship cockpit has been recreated, this time with the large window showing a view of the Sun, including visible solar protuberances. The design elements, including the large, hovering screen and the anthropomorphic AI entity, remain consistent with the previous version. (…)

Me: ok, good effort. Please make the Sun bigger so we only see maybe half of it and make sure it’s not lucid and have less protuberances.

DALL·E wizard test image 4 - Recreate the futuristic spaceship cockpit interior, maintaining the current design with the large, hovering, transparent screen to one side
DALL·E wizard test image 5 - Modify the futuristic spaceship cockpit interior view. This time, make the Sun visible through the large window even larger, so that only about half o

Wizard: The image has been modified according to your latest instructions. The Sun is now larger in the window, showing only about half of it for a more dramatic effect. The Sun is depicted with less lucidity and fewer solar protuberances, adding to the intensity of the scene.

Conclusion

Having the chance to essentially direct my own sci-fi scene was very cool. This wizard is a glimpse into those possibilities – I can’t wait to see what other users will generate with tools like this! The only limit is our imagination. My preferred text to image tool has been Midjourney for its capabilities of rendering photorealistic images, DALL-E is not yet there. However DALL-E has something else, something very special going for it. The mere fact that you can amend a photo through normal conversation and not have to fiddle around with complex prompts is a serious advantage.  

A few words about working with the GPT Builder, the tool with which I created this wizard..The builder utilises the standard GPT-4 model that you may be familiar with from the typical ChatGPT interface. The workflow is straightforward – you have an Editor window to provide prompts and instructions on the left side of the browser window, and a Preview window (on the right side of the browser window) that displays AI-generated responses in real time. This allows rapidly iterating – no coding needed.

However, I noticed some inconsistencies in how precisely the wizard follows defined specifications. For instance, despite providing a categorical list of image types and associated phases, it would sometimes suggest its own categories. Very frustrating, given my categories and prompts spanned just 2 pages – certainly not exceeding system tokens. I realised uploading the content as a document forces the wizard to leverage what I outlined. Since taking this step, it has adhered accurately to my categories and phases. Likely it needed more explicit direction to prioritise my structure. Another pain point is GPT-4 timing out mid-session – inconvenient when you get notified to resume later right as you provide new instructions.

I will post this GPT and my other wizards / assistants once OpenAI has released their GPT Store.