Grok Imagine Video 1.5
xAI · Released May 31, 2026

Grok Imagine Video 1.5
#1 Image-to-Video
AI Generator

Grok Imagine Video 1.5 is xAI's top-ranked image-to-video model. Upload a photo, add an optional motion prompt, and get back a 15-second clip with synchronized audio — all in about 15 seconds. It's currently the #1 model on the Image-to-Video Arena leaderboard.

Explore Features
#1 Image-to-Video Arena
~15s Generation
Native Audio Included
Up to 720p · 15s
+52 Elo vs v1.0

// What's New in 1.5

Six Major Upgrades
Over Version 1.0

Grok Imagine Video 1.5 isn't a patch — it's a full generational step. Every dimension that matters for professional video creation has been rebuilt: audio, faces, physics, and speed.

Native Audio Generation

Dialogue, ambient sound, and music are generated in-sync with the video — no separate audio pipeline needed. Grok Imagine 1.5 produces the full sensory experience in a single generation pass.

New in 1.5

Advanced Face Accuracy

Blind testing showed substantial gains in face realism. Grok Imagine Video 1.5 generates accurate facial details — including celebrity likenesses — and maintains them across cuts and angles without drift.

+52 Elo

Temporal Coherence & Video Extend

Chain multiple clips into longer narratives using the Extend Video feature. Version 1.5 significantly reduces quality degradation across chained segments, keeping lighting, characters, and motion physics consistent.

Improved

Photorealism & Lighting Quality

Lighting and textures look noticeably sharper than v1.0, with more accurate physical lighting and material detail. Output clips are ready to use directly in your ads — even at the $0.08/sec 480p tier.

Pro Quality

Superior Motion Control

Optional motion prompts guide camera moves, subject actions, and atmosphere while preserving your source image. Grok Imagine 1.5 executes direction precisely instead of drifting away from the original frame.

Smarter

Faster Generation & Wider API

A 15-second clip typically renders in about 15 seconds — roughly 2× faster than v1.0. xAI expanded compute allocation for 1.5, increasing stability at scale. API rate limits sit at 60 requests/minute — ready for high-volume production workflows.

~15s

// Generated Samples

See What Grok Imagine Video 1.5
Actually Creates

Real image-to-video outputs from Grok Imagine Video 1.5 — with native audio, shown in a tight three-column grid.

Grok Imagine Video 1.5 The girl reading comics AI videoI2V

The girl reading comics

Grok Imagine Video 1.5 The man on the swing AI videoI2V

The man on the swing

Grok Imagine Video 1.5 A charming girl is smiling at you. AI videoI2V

A charming girl is smiling at you.

Grok Imagine Video 1.5 A pleasant day in the fields AI videoI2V

A pleasant day in the fields

Grok Imagine Video 1.5 A woman is dragging a colorful suitcase AI videoI2V

A woman is dragging a colorful suitcase

Grok Imagine Video 1.5 The plane flew over the boy's head AI videoI2V

The plane flew over the boy's head

Grok Imagine Video 1.5 A woman in a kimono is performing a dance AI videoI2V

A woman in a kimono is performing a dance

Grok Imagine Video 1.5 The cat is fiddling with the screen TV AI videoI2V

The cat is fiddling with the screen TV

Grok Imagine Video 1.5 In the bamboo forest at night a brown bear is watching you AI videoI2V

In the bamboo forest at night a brown bear is watching you

Grok Imagine Video 1.5 The girl with golden hair is smiling happily AI videoI2V

The girl with golden hair is smiling happily

// Arena Leaderboard · May 2026

Ranked #1
Image-to-Video Arena

xAI's Grok Imagine Video 1.5 Preview secured the top position on the competitive Arena.ai Image-to-Video leaderboard in May 2026 — outscoring ByteDance Seedance 2.0, Alibaba's HappyHorse 1.0, and Google Veo in blind human evaluation.

RankRank SpreadModelScoreVotes
11-2
grok-imagine-video-1.5-preview-720pPreliminary
xAI · Proprietary
1473 ±95,564
21-2
dreamina-seedance-2.0-720p
Bytedance · Proprietary
1467 ±1156,710
33-3
happyhorse-1.0
Alibaba-ATH · Proprietary
1443 ±1233,267
44-4
grok-imagine-video-720p
xAI · Proprietary
1421 ±6380,580
55-8
veo-3.1-audio
Google · Proprietary
1397 ±1125,113
65-9
veo-3.1-audio-1080p
Google · Proprietary
1393 ±1024,381
75-9
veo-3.1-fast-audio
Google · Proprietary
1384 ±999,851
85-9
grok-imagine-video-480p
xAI · Proprietary
1383 ±919,415
96-11
veo-3.1-fast-audio-1080p
Google · Proprietary
1374 ±1124,874
109-11
vidu-q3-pro
Shengshu · Proprietary
1360 ±836,674

// Grok Imagine Video 1.5 vs Kling 3.0 vs Veo 3.1

How It Stacks Up
Against the Competition

Grok Imagine 1.5 leads on image-to-video quality and pricing. Here's the full breakdown against Kling 3.0 and Veo 3.1 — the two models most creators are evaluating in 2026.

FeatureGrok Imagine 1.5BestKling 3.0Google Veo 3.1Sora 2
Image-to-Video Rank✦ #1Top 5Top 5Top 5
Max Resolution720p1080p1080p1080p
Max Duration15 sec10 sec8 sec20 sec
Native AudioIncluded
Starting Price / sec$0.08~$0.14~$0.34+~$0.10
Generation Speed~15 seconds2–4 min2–5 min1–3 min
Public API60 RPMEnterprise only
Video Extend
Face Accuracy (blind test)Top-ratedStrongStrongModerate

// Pricing from official docs as of May 2026. Arena rankings from Arena.ai / Artificial Analysis leaderboard. Subject to change.

// What Creators Build With It

Built for Real
Production Workflows

From 30-second ad spots to social UGC content and product demos — Grok Imagine Video 1.5 fits wherever fast, high-quality AI video creation matters.

// 01

Advertising & Brand Video

Turn product images into scroll-stopping ad clips with synchronized audio branding. Skip the production crew for concept tests and social ads — iterate in minutes, not days.

Medium shot, product floating center frame, warm golden light. Slow camera orbit. Ambient store sounds.
// 02

UGC & Social Content

Generate creator-style video content at scale. Grok Imagine Video 1.5's face accuracy and character consistency make it ideal for repeating characters across clip series without drift.

Woman unboxing package, excited expression, natural bedroom light, handheld camera feel. Genuine reaction.
// 03

Product Animation & E-Commerce

Animate static product photography into videos for PDPs, emails, and retargeting ads. No 3D modeling. No studio. Upload a product photo — output ready for any platform ratio.

Sneaker rotating slowly on white surface, studio lighting, clean shadows. 360-degree spin.
// 04

Film & Narrative Pre-Viz

Directors and writers use Grok Imagine Video 1.5's Video Extend feature to build full-scene storyboards at near-real quality. Chain clips to lay out full sequences before committing to production budgets.

Extreme wide shot, desert highway at dusk. Lone figure walking toward camera. Wind sound. Cinematic grain.
// 05

Game & App Trailers

Use reference images of characters to generate consistent in-world footage for trailers and app store previews. Native audio means you get a complete trailer asset — not just silent footage.

Fantasy warrior sprinting through forest, moonlight. Epic orchestral swell. Dynamic camera chase shot.
// 06

Developer & API Workflows

With 60 RPM and a clean REST API at $0.08/sec (480p), Grok Imagine Video 1.5 plugs directly into content platforms, marketing automation tools, and AI-powered creative apps at scale.

grok-imagine-video-1.5-preview · REST API · 60 RPM · us-east-1 + eu-west-1

Frequently Asked Questions

Grok Imagine Video 1.5 is xAI's next-generation AI image-to-video model, officially released May 31, 2026. It animates input images into cinematic video clips up to 15 seconds at 480p or 720p, with native audio generated in sync. It currently holds the #1 position on the Image-to-Video Arena leaderboard with a +52 Elo improvement over version 1.0.

Grok Imagine Video 1.5 pricing is pay-per-second: 480p costs $0.08/sec and 720p costs $0.14/sec. Each input image adds $0.01. A 5-second 480p clip runs $0.40; a 10-second 720p clip is $1.41. See the full breakdown on our pricing page. pricing page

Grok Imagine Video 1.5 outranks Kling 3.0 on the Arena.ai image-to-video leaderboard. Grok 1.5 also generates native audio automatically, while Kling 3.0 does not. For image-to-video workflows, Grok 1.5 is the top-ranked choice.

Grok Imagine Video 1.5 ranks #1 on the Image-to-Video Arena, ahead of Veo 3.1. Grok 1.5 ships native audio in every clip; Veo 3.1 still needs separate audio workflows. For still-image animation, Grok 1.5 is the stronger pick as of May 2026.

Grok Imagine Video 1.5 accepts JPG, PNG, and WebP images as input. Output is MP4 video at 480p or 720p, up to 15 seconds per generation. Upload through the web UI or pass image URLs via the xAI API.

Grok Imagine Video 1.5 is not fully free — it uses pay-per-second pricing via the xAI API. You can test with small clips at $0.08/sec for 480p output. Check our pricing page for credit packs and starter options. pricing page

Grok Imagine Video 1.5 supports 480p and 720p output. Video duration goes up to 15 seconds per generation. It accepts multiple aspect ratios to match your input image. For longer content, the Video Extend feature lets you chain clips together while maintaining character and scene consistency.

Yes. Grok Imagine Video 1.5 includes native audio generation — the model produces synchronized ambient sounds, dialogue, and music as part of the same generation pass. No separate audio tool is required. Audio is included in the base pricing with no additional cost beyond input charges.

The Grok Imagine Video 1.5 API is available via the xAI API platform at docs.x.ai. The model name is grok-imagine-video-1.5-preview, with the alias grok-imagine-video-1.5-2026-05-30. Rate limits are 60 requests per minute. Regions supported: us-east-1 and eu-west-1.

Yes. Videos generated through paid plans carry full commercial usage rights for advertising, marketing, and distribution. Always review xAI's terms of service for content restrictions, particularly around generating likeness content or regulated industries.