Dialogues with an AI: Creating a video with OpenAI’s Code Interpreter

Introduction

In the world of digital creativity, with the help of AI, the possibilities are truly endless and we are only starting to understand the potentials AI can offer. Some eight months ago I hadn’t heard of Midjourney, let alone OpenAI or ChatGPT for that matter. So what happens when you combine two instances of artificial intelligence and add a dash of artistic imagination? Magic, maybe? I embarked on an exploration to create a video using a panoramic image I had prompted in Midjourney together with OpenAI’s Code Interpreter. I had first come across this method reading a Medium article called ChatGPT’s Code Interpreter : Converting An Image into A Video! and wanted to see if I was able to recreate it.

I was confident I could do this as I have some experience using GPT4 and having just successfully co-coded with GPT4 a small programme inspired by another Medium article by Tristan Wolff on GPT4’s Code Interpreter. I created a flash card game for Croatian vocabulary – now I wanted to step it up. Read my my short post AI — how I came to successfully “code” my first useful application on this on Medium.

A Monologue by AI

The idea was simple: Take a panoramic image and have it converted by GPT-4 into a video that pans across the view. What do you expect to be more labour intensive for you, the human component? Creating the panorama image or converting the image into a video? Well, creating the panorama image for sure, since the conversion is done with a single prompt once you figure out how, this is where GPT-4 is showing some newly added muscle. Following the conversion process wasn’t as dull as you could imagine. GPT-4 engages in a monologue commenting the various steps it takes. Sometimes it would come across an issue, but would explain what it is and figure out a work-around – truly amazing to watch (read!). This is possible due to OpenAI’s newly launched plug in called Code Interpreter, your AI assistant on steroids.

But before we have a look at the Code Interpreter, let’s inspect the creation of the panorama image.

Midjourney: Crafting Panoramas with AI

So before I even engaged with GPT-4, I embarked on a different kind of creative AI journey. The panoramic image that serves as the foundation for this project wasn’t captured with a traditional camera. Instead, it was crafted using Midjourney’s innovative “Panning” feature. In short Midjourney’s “Panning” feature allows you to extend an AI-generated image in a single direction, creating a broader view. Before moving on let’s first see the base image.

Text to image in Midjourney, 1st result after prompting

The prompt I used for the base image results in four images (above).

/imagine An aquatics centre beside a waterfront in the desert, beautifully illuminated in the night, breathtaking structure, by Santiago Calatrava, natural elements –ar 16:9 –v 5.2 –style raw

Once I decided which I wanted to go with I Upscaled it, it was the top left one (U2). For a quick intro about how to use Midjourney, please read this blog post Harnessing AI – the Midjourney case I wrote and the video I created over at Global campus.

Text to image in Midjourney, upscaled version of preferred image.
Midjourney, base image with panning arrows

The process is somewhat intuitive: beneath each upscaled image, arrow buttons allow you to extend the image in any direction. Clicking an arrow initiates the panning process, with the AI generating a selection of images to choose from, effectively extending the original image in the chosen direction.
But Midjourney’s tool goes beyond just expanding images. By switching to “Remix mode” (done with the /settings command), you can alter the prompt guiding the AI as you pan, allowing you to create evolving, panoramic narratives. Bear in mind that the resolution grows with each panning, resulting in an impressively detailed panoramic view. My prompt for the first panning phase was:

/imagine A shopping centre beside a waterfront in the desert, beautifully illuminated in the night, breathtaking structure, by Santiago Calatrava, natural elements –v 5.2 –style raw

Midjourney, four image variations after first panning

Again you are presented with four versions and once you’ve made up your mind you upscale the one you like and continue panning. Having chosen an image I upscaled it and then prompted the third and final image using the same panning direction:

/imagine A skyscraper beside a waterfront in the desert, beautifully illuminated in the night, breathtaking structure, by Santiago Calatrava, natural elements –v 5.2 –style raw

Midjourney, four image variations after second panning
Text to image in Midjourney, final image after panning twice

Finally I upscaled my preferred version and downloaded it to my computer. Well, as a matter of fact I “down”loaded it to my cloud to be precise.

The combination of the “Panning” feature and the ability to change prompts opens up a world of creative possibilities. With this tool you are able to craft intricate, visually rich panoramic imagea that are usable as such or as in my case serve as the perfect starting point for my video project.


The Code Interpreter

Let’s move on to GPT-4 available to subscribers who have OpenAI’s paid version called Plus. Here too, you first need to enable a function, in this case the plugins under Settings and when choosing New Chat make sure to switch to Code Interpreter under GPT-4 (see screenshot) – then you are good to go. I began by asking the AI about the process. It explained that a video is essentially a series of images, or frames, displayed in quick succession. The task was to extract these frames from the panoramic image in a way that created a sense of motion.

screenshot of GPT4 - Code Interpreter

The prompt

I blatantly copied the prompt from the Medium article I cited at the beginning, but tweaked it bit, mainly the camera movement – here again called panning.

Hi, please convert the uploaded panoramic image into a 3:2 aspect ratio video, aligned on the image’s left side. Use smooth transitions with the sequence: Left -> Center -> Right -> Center -> Left, using the imageio library. Save frames directly to a video file with an 8-pixel frame step. If needed, crop the image edges to match the frame step size.

I then uploaded the panorama image. You probably have guessed it by now, GPT-4 does not animate the image or parts of it. What it does is it takes the image and points an imaginary camera to it and pans the camera across the image. This will have the image move into the camera’s view and vanish on the other side creating an illusion of movement. Here is what GPT-4 answered:

Alright, we’ll need to perform the following steps:

  1. Load the image and determine its dimensions.
  2. Calculate the necessary crop to ensure the width of the image is divisible by the frame step size.
  3. Create a sequence of frames, moving from left to right, right to center, center to left, and saving each frame directly to the video file.
  4. Each frame will be a crop of the image with a 3:2 aspect ratio, and the horizontal position of the crop will change smoothly over time to create the panning effect.

Let’s start with steps 1 and 2: load the image, determine its dimensions, and calculate the necessary crop.

Finished working
Show work

The original image dimensions were 794 pixels high and 2832 pixels wide. The width is already divisible by the frame step size of 8 pixels, so no cropping was necessary.

Next, let’s calculate the frame dimensions for the 3:2 aspect ratio, then generate and save the frames to the video file.

Finished working
Show work

I apologize for the mistake. The calculated frame height for a 3:2 aspect ratio is larger than the height of the image. This happened because I tried to use the full width of the image for a single frame, which is not suitable in this case.

We should adjust our approach and set the height of the frame to match the height of the image. Then, we can adjust the frame width to maintain the desired 3:2 aspect ratio. After that, we will calculate the number of frames that we can generate based on the new frame width and the image width.

Let me do these adjustments and proceed with the video creation.

Finished working
Show work

I apologize for the oversight. I forgot to define the video_path variable before using it. Let’s correct this and proceed with the video creation.

Finished working
Show work

I apologize for the error. It seems like there’s a problem with frame size consistency throughout the video. This issue might be due to the calculated pos_x value going negative or beyond the image width during the transitions, resulting in frames of smaller size.

Let’s add some checks to prevent this and try again.

Finished working
Show work

I apologize for the warning during the video generation process. It appears that the video encoding software (FFmpeg) prefers frame dimensions to be divisible by 16 (for optimal performance and compatibility with most codecs and players), but our frame dimensions are not. The software automatically resized the frames to meet this preference.

Nonetheless, the video was created successfully. You can download it using the following link:

Download the video

Please review the video, and let me know if you need any further adjustments or assistance.

Easy execution despite a few digital hurdles

As you can see from GPT-4’s answer not everything went as expected. There were some issues with the aspect ratio, then it had forgotten to define the video_path variable, it encountered a problem with frame size consistency and finally there was a problem with video encoding software. What I find very impressive is GPT-4’s capability to analyse, come up with a solution by itself and then act accordingly to fix the problem – without my interference along the process. Furthermore I very much appreciate the fact that it keeps me posted about the process’s every step, not keeping the user in the dark is crucial I think.

Adding some adjustments

I downloaded the video and had a look at it. It clearly was a video, great. Mission accomplished! It wasn’t really anything special, but I wanted to see how GPT-4 would react if I asked for some corrections to the clip. So I prompted:

Ok, thank you, this is a good start. I didn’t realise I would need to specify a frame rate as the clip is a bit awkward to look at. Let’s try the following. Please use the same panoramic image and make a panning video that starts a the left side and ends on the right side, so NO back and forth. Please make the video last 12-15s and add some ease in at the beginning and ease out at the end. Is this possible?

Since the panning movements is way too hasty I wanted to slow it down and start the video with the ease-in and close it with an ease-out effects. Here is what GPT-4 answered:

Sure, your updated requirements are clear. We need to:

  1. Use the same panoramic image.
  2. Create a panning video that starts at the left side and ends on the right side.
  3. Make the video last between 12 and 15 seconds.
  4. Add an “ease-in” at the beginning and an “ease-out” at the end of the video.

Here’s how we’ll achieve this:

  1. We’ll calculate the total number of frames based on the desired duration and a standard film frame rate of 24 frames per second (fps). This will make the video smooth and pleasing to the eye.
  2. Instead of moving at a constant speed, we’ll adjust the speed of the panning effect over time. We’ll start slow (ease-in), move faster in the middle, and slow down towards the end (ease-out). A common way to achieve this is using a “sinusoidal ease-in/out” function, which produces a smooth, natural-looking speed variation.

Let’s go ahead and create the video as per these updated requirements.

Finished working
Show work

The video was created successfully, with a duration of 15 seconds, a frame rate of 24 fps, and a smooth “ease-in” and “ease-out” panning effect.

Please note the warning about the frame dimensions not being divisible by 16, as mentioned before. FFmpeg resized the frames to ensure video compatibility with most codecs and players. This resizing should not noticeably affect the video’s appearance.

You can download and review the video from the following link:

Download the video

Please let me know if you need any further adjustments or assistance.

Video produced by GPT4 with Code Interpreter based on image prompts in Midjourney.

Conclusion

Creating a panoramic video with OpenAI’s Code Interpreter was a blend of discovery and excitement. This process allowed me to witness firsthand the power and flexibility of programming for creative tasks. More than that, it highlighted the collaborative potential of human creativity and AI capabilities. The result was a video that more or less beautifully captured the breadth of the panoramic image, offering a dynamic and engaging viewing experience. I’m excited about the vast creative possibilities that this AI-human collaboration opens up, and I look forward to exploring them further.