We Used Veo 3 for a Treatment. This Is What Happened.

It happened the same way it always does: someone sends you a link and says, "Have you seen this?" A few seconds later, you’re watching Veo 3 generate one of the most realistic portrayals of humans by AI yet, combined with sweeping tracking shots, handheld tension, and sound synced storyboards at the push of a prompt.

Google’s latest video generation tool, Veo 3, can produce short clips (up to one minute) from natural language prompts — complete with simulated camera moves, cinematic grading, and ambient sound (although we’ve personally found this last feature a bit unnecessary, as we usually end up muting the audio). The tool is integrated with Gemini and Flow, plugs directly into cloud storage, and outputs in 1080p. Unlike Runway, it handles motion with surprising spatial logic. Unlike Pika, it knows how to slow down. Unlike Sora, it’s actually usable. So its launch was impactful in its own way.

A couple of days after the tool came out, we put it to the test while working on visual research for Volkswagen. We weren’t sure if the scene would be useful at all or if we’d even like the result.

This was the scene from the script we decided to test: “Father and daughter get out of the car, happy. She hugs her dad, and they head toward the music festival. We see the car in the foreground during the packshot, with the festival in the background.”

In true 2025 fashion, we briefed ChatGPT and asked it to write the prompt for this scene. This is what it came up with:

“A cinematic shot at golden hour. A father (43) and his daughter (18) step out of a sleek, modern SUV parked in the foreground. They are smiling, joyful. The daughter gives her father a warm hug, then they walk side by side toward a vibrant music festival in the background. The car is shown prominently in the foreground (packshot), beautifully lit with soft sunlight hitting the bodywork to highlight its curves and LED details.

The festival in the background is alive — soft focus crowd, colorful lights, a stage in the distance, light smoke in the air, flags moving gently.

Camera direction: Wide angle lens (24mm), slight low angle to give presence to the car, with a shallow depth of field (f/2.8) to subtly separate foreground and background while keeping the emotion in focus. Natural lens flares and soft shadows. Slow, steady dolly out or locked off shot for a contemplative, grounded feel.

Aspect ratio: 2.39:1 (cinematic widescreen)
Lighting: Natural sunset lighting (golden hour), with warm tones and soft ambient contrast. Add fill light on characters’ faces to preserve emotional clarity.
Mood: Warm, nostalgic, human. A tender moment that feels both intimate and epic — grounded but cinematic.”

Below is the output of our first experience with Veo 3:

Let’s talk about what worked first: you could argue that the cinematography came through just fine. It’s warm, joyful, soft flares, golden hour sunset light. In our view, it looks a little "too perfect", that "margarine commercial" kind of perfect. But still, many directors go for this type of look (because clients ask them to), so in that respect, the shot can be considered successful.

But the car?

Well, the car is in the middle of the festival even though, yes, technically the car is in the foreground and the festival is in the background. This is the kind of nuance these AI generated video clips don’t interpret very well. Any human would understand what the text meant: the car is located in the parking area, overlooking the festival in the background. Machines don’t interpret implied meaning. They need thoroughly detailed instructions, which can become time consuming and overall daunting, even if you're using ChatGPT to shorten that step.

And while Veo claims to deliver 1080p quality video, we’re not entirely sure that’s actually true. The examples we’re showing here had to be compressed to fit within WordPress’ upload limits, but even the original video we downloaded from the Veo site didn’t quite look like 1080p.

The script we were working with had very specific costume notes for the characters, so we prompted it again, this time detailing what their festival gear looked like, in an attempt to match the script more closely. We also asked ChatGPT to be more specific about the car’s position in the parking lot in relation to the festival in the background. Here’s what came back:

Fashion wise? Spot on. Veo 3 understood the assignment when it came to mood, look, and feel. We’ll even overlook the fact it used an Audi model, even though we explicitly mentioned Volkswagen in the second prompt.

The issue, however, was the dynamic between father and daughter.

We assume the model likely drew from source images of couples getting out of cars rather than parent child relationships. To a trained or picky director’s or visual researcher’s eye, the characters looked a little too friendly. The vibe leaned more toward sugar daddy and girlfriend than father and daughter.

So, what’s the point of telling you this?

It’s just to give a real world example of how this software works: where it shines, where it misses, and how much context still matters. Even with a short test case like this, it becomes clear just how many variables are at play when a researcher looks for the perfect image. Some of those variables are objective. Others are intuitive, unconscious, and deeply human.

Could you get the perfect shot you’re looking for with Veo 3?
Yes, we believe you could.

It’s just that, as it stands now — and we understand this is a rapidly evolving technology — the process still involves a lot of trial and error. And in the search for the right image, you risk losing essential human qualities: authenticity in behavior, realism in light, emotional rhythm, physical nuance. All the things that make a shot feel like a shot.

So is Veo useful for treatment making? Right now, we’re not entirely convinced.

But we’re paying close attention. And testing, one imperfect shot at a time.

Further information about Veo 3 subscription tiers:

Google AI Pro – $19.99/month

This entry-level plan includes:
• Limited access to Veo 3: Users receive a trial pack of 10 Veo 3 video generations within the Gemini app on Android, iOS, and desktop platforms.
• Access to Flow: An AI filmmaking tool that integrates with Veo 3 for video creation.
• Additional features: Includes 2 TB of cloud storage, access to Gemini in Gmail, Docs, and other Google apps, and the NotebookLM research assistant.

Note: After utilizing the 10 Veo 3 video generations, users revert to using Veo 2 unless they upgrade their subscription.

⸻

Google AI Ultra – $249.99/month

This premium plan offers:
• Full access to Veo 3: Unlimited video generation capabilities.
• Enhanced Flow features: Includes advanced camera controls and 1080p video generation.
• Additional benefits: Access to Gemini 2.5 Pro with Deep Think, 12,000 monthly AI credits, Project Mariner (an AI research assistant), YouTube Premium, and 30 TB of cloud storage.

Availability: The AI Ultra plan is available in over 70 countries, including the U.S. and U.K.

⸻

Student Access

Eligible university students in select countries, such as the U.S., U.K., Brazil, Japan, and Indonesia, can access the Google AI Pro plan for free until the end of the 2026 academic year. This includes limited access to Veo 3 and other AI tools.

⸻

What Next?

If this resonates with you, we’ll be sharing more deep dives into the craft of treatment writing and design. Let us know if there’s a topic you’d like us to explore next.

🔗 Check our work at http://www.treatmentsbyghost.com
🔗 Job inquiries info@treatmentsbyghost.com
🔗 Follow @ghost_treatments for more insights

Get a free excerpt from The Treatment Winning Bible