Native Audio Generation
Dialogue, ambient sound, and music are generated in-sync with the video — no separate audio pipeline needed. Grok Imagine 1.5 produces the full sensory experience in a single generation pass.
New in 1.5
Grok Imagine Video 1.5 is xAI's top-ranked image-to-video model. Upload a photo, add an optional motion prompt, and get back a 15-second clip with synchronized audio — all in about 15 seconds. It's currently the #1 model on the Image-to-Video Arena leaderboard.
// What's New in 1.5
Grok Imagine Video 1.5 isn't a patch — it's a full generational step. Every dimension that matters for professional video creation has been rebuilt: audio, faces, physics, and speed.
Dialogue, ambient sound, and music are generated in-sync with the video — no separate audio pipeline needed. Grok Imagine 1.5 produces the full sensory experience in a single generation pass.
New in 1.5Blind testing showed substantial gains in face realism. Grok Imagine Video 1.5 generates accurate facial details — including celebrity likenesses — and maintains them across cuts and angles without drift.
+52 EloChain multiple clips into longer narratives using the Extend Video feature. Version 1.5 significantly reduces quality degradation across chained segments, keeping lighting, characters, and motion physics consistent.
ImprovedLighting and textures look noticeably sharper than v1.0, with more accurate physical lighting and material detail. Output clips are ready to use directly in your ads — even at the $0.08/sec 480p tier.
Pro QualityOptional motion prompts guide camera moves, subject actions, and atmosphere while preserving your source image. Grok Imagine 1.5 executes direction precisely instead of drifting away from the original frame.
SmarterA 15-second clip typically renders in about 15 seconds — roughly 2× faster than v1.0. xAI expanded compute allocation for 1.5, increasing stability at scale. API rate limits sit at 60 requests/minute — ready for high-volume production workflows.
~15s// Generated Samples
Real image-to-video outputs from Grok Imagine Video 1.5 — with native audio, shown in a tight three-column grid.
I2VThe girl reading comics
I2VThe man on the swing
I2VA charming girl is smiling at you.
I2VA pleasant day in the fields
I2VA woman is dragging a colorful suitcase
I2VThe plane flew over the boy's head
I2VA woman in a kimono is performing a dance
I2VThe cat is fiddling with the screen TV
I2VIn the bamboo forest at night a brown bear is watching you
I2VThe girl with golden hair is smiling happily
// Arena Leaderboard · May 2026
xAI's Grok Imagine Video 1.5 Preview secured the top position on the competitive Arena.ai Image-to-Video leaderboard in May 2026 — outscoring ByteDance Seedance 2.0, Alibaba's HappyHorse 1.0, and Google Veo in blind human evaluation.
| Rank | Rank Spread | Model | Score | Votes |
|---|---|---|---|---|
| 1 | 1-2 | grok-imagine-video-1.5-preview-720pPreliminary xAI · Proprietary | 1473 ±9 | 5,564 |
| 2 | 1-2 | dreamina-seedance-2.0-720p Bytedance · Proprietary | 1467 ±11 | 56,710 |
| 3 | 3-3 | happyhorse-1.0 Alibaba-ATH · Proprietary | 1443 ±12 | 33,267 |
| 4 | 4-4 | grok-imagine-video-720p xAI · Proprietary | 1421 ±6 | 380,580 |
| 5 | 5-8 | veo-3.1-audio Google · Proprietary | 1397 ±11 | 25,113 |
| 6 | 5-9 | veo-3.1-audio-1080p Google · Proprietary | 1393 ±10 | 24,381 |
| 7 | 5-9 | veo-3.1-fast-audio Google · Proprietary | 1384 ±9 | 99,851 |
| 8 | 5-9 | grok-imagine-video-480p xAI · Proprietary | 1383 ±9 | 19,415 |
| 9 | 6-11 | veo-3.1-fast-audio-1080p Google · Proprietary | 1374 ±11 | 24,874 |
| 10 | 9-11 | vidu-q3-pro Shengshu · Proprietary | 1360 ±8 | 36,674 |
// Grok Imagine Video 1.5 vs Kling 3.0 vs Veo 3.1
Grok Imagine 1.5 leads on image-to-video quality and pricing. Here's the full breakdown against Kling 3.0 and Veo 3.1 — the two models most creators are evaluating in 2026.
| Feature | Grok Imagine 1.5Best | Kling 3.0 | Google Veo 3.1 | Sora 2 |
|---|---|---|---|---|
| Image-to-Video Rank | ✦ #1 | Top 5 | Top 5 | Top 5 |
| Max Resolution | 720p | 1080p | 1080p | 1080p |
| Max Duration | 15 sec | 10 sec | 8 sec | 20 sec |
| Native Audio | Included | ✗ | ✓ | ✗ |
| Starting Price / sec | $0.08 | ~$0.14 | ~$0.34+ | ~$0.10 |
| Generation Speed | ~15 seconds | 2–4 min | 2–5 min | 1–3 min |
| Public API | 60 RPM | ✓ | Enterprise only | ✓ |
| Video Extend | ✓ | ✓ | ✗ | ✗ |
| Face Accuracy (blind test) | Top-rated | Strong | Strong | Moderate |
// Pricing from official docs as of May 2026. Arena rankings from Arena.ai / Artificial Analysis leaderboard. Subject to change.
// What Creators Build With It
From 30-second ad spots to social UGC content and product demos — Grok Imagine Video 1.5 fits wherever fast, high-quality AI video creation matters.
Turn product images into scroll-stopping ad clips with synchronized audio branding. Skip the production crew for concept tests and social ads — iterate in minutes, not days.
Generate creator-style video content at scale. Grok Imagine Video 1.5's face accuracy and character consistency make it ideal for repeating characters across clip series without drift.
Animate static product photography into videos for PDPs, emails, and retargeting ads. No 3D modeling. No studio. Upload a product photo — output ready for any platform ratio.
Directors and writers use Grok Imagine Video 1.5's Video Extend feature to build full-scene storyboards at near-real quality. Chain clips to lay out full sequences before committing to production budgets.
Use reference images of characters to generate consistent in-world footage for trailers and app store previews. Native audio means you get a complete trailer asset — not just silent footage.
With 60 RPM and a clean REST API at $0.08/sec (480p), Grok Imagine Video 1.5 plugs directly into content platforms, marketing automation tools, and AI-powered creative apps at scale.
grok-imagine-video-1.5-preview, with the alias grok-imagine-video-1.5-2026-05-30. Rate limits are 60 requests per minute. Regions supported: us-east-1 and eu-west-1.