Updated: April 4, 2026
8 min read

Netflix Open‑Source VOID Model Revolutionizes Video Inpainting

Netflix’s newly open‑source VOID model enables developers to erase objects from video while automatically preserving physics‑aware interactions, making AI video inpainting both realistic and production‑ready.

Netflix Announces Open‑Source VOID: A Leap for AI Video Inpainting

In a move that could reshape the workflow of video editors, AI researchers at Netflix and the Institute of Computer Science (INSAIT) have released VOID (Video Object and Interaction Deletion) as an open‑source model. The announcement, first reported by MarkTechPost, highlights a technology that goes beyond traditional pixel‑filling inpainting. VOID not only removes the target object but also intelligently resolves the resulting physical interactions—shadows, reflections, and even gravity‑driven motion—so the edited footage looks as if the object never existed.

For tech enthusiasts, AI researchers, and developers who build video‑centric applications, VOID represents a rare blend of cutting‑edge diffusion modeling and physics‑aware reasoning. The model is built on top of Alibaba’s CogVideoX architecture, fine‑tuned with novel training pipelines that generate synthetic paired data at scale. Below we break down the core innovations, performance benchmarks, and the broader impact of making such a powerful tool freely available.

What Is the Netflix VOID Model?

VOID stands for Video Object and Interaction Deletion. It is an open‑source AI video inpainting system that can:

Erase any user‑specified object from a video sequence.
Re‑synthesize background, shadows, and reflections.
Apply physics‑aware adjustments such as falling objects, displaced liquids, or collapsing structures.
Operate on up to 197 frames at a default resolution of 384 × 672.

The model’s primary differentiator is its ability to understand causality in a scene. When a person holding a guitar is removed, VOID automatically lets the guitar drop, rather than leaving it floating in mid‑air. This level of interaction awareness is achieved through a combination of quad‑mask conditioning, a synthetic paired‑data generation pipeline, and a two‑pass diffusion process.

Technical Deep‑Dive: Quad‑Mask Conditioning, Synthetic Paired Data, and Two‑Pass Diffusion

Quad‑Mask Conditioning: More Than a Binary Mask

Traditional inpainting uses a binary mask (0 = remove, 255 = keep). VOID introduces a four‑channel quad‑mask that encodes:

Mask Value	Semantic Meaning
0	Primary object to delete
63	Overlap region (object ↔ background)
127	Interaction‑affected region (e.g., falling objects)
255	Static background to preserve

By feeding this structured map into the diffusion transformer, VOID receives explicit guidance about which pixels will need physics‑aware reconstruction, dramatically improving temporal consistency and realism.

Synthetic Paired Data: Training Without Real‑World Ground Truth

High‑quality paired videos (scene with object vs. scene without object) are virtually impossible to capture at scale. The Netflix team solved this by generating synthetic data using two pipelines:

HUMOTO: Human‑object interactions rendered in Blender with motion‑capture data. After rendering a scene with a human, the human is removed and physics simulation is re‑run, producing a physically correct “object‑only” version.
Kubric: Object‑only collisions generated from Google’s Kubric framework, which simulates realistic object dynamics and provides paired before/after clips.

These pipelines yield millions of paired clips, giving the model a robust understanding of how objects behave when their supporting agents disappear. The synthetic nature also ensures diverse lighting, textures, and motion patterns, which translates into strong generalization on real‑world footage.

Two‑Pass Diffusion: Stabilizing Shape and Motion

VOID’s inference runs through two sequential transformer checkpoints:

Pass 1 – The base inpainting model that fills missing regions using the quad‑mask and a text prompt describing the desired post‑removal scene.
Pass 2 – An optional correction stage that addresses the “object morphing” failure mode common in video diffusion. It warps the latent noise from Pass 1 with optical flow, then re‑runs diffusion to lock object shapes across frames.

The two‑pass design is lightweight (the second pass is only invoked when needed) yet delivers a noticeable boost in temporal coherence, especially for longer clips or fast‑moving objects.

Performance Benchmarks: VOID vs. Existing Video Inpainting Solutions

The research paper evaluates VOID against a suite of state‑of‑the‑art tools, including ProPainter, DiffuEraser, Runway, MiniMax‑Remover, ROSE, and Gen‑Omnimatte. Key findings:

Metric	VOID	Best Competitor
Temporal Consistency (SSIM)	0.92	0.84
Physics‑Aware Accuracy (Human Rating)	94 %	71 %
Inference Time (per 30‑frame clip)	≈ 12 s (GPU A100)	≈ 18 s

In user studies, VOID’s physics‑aware reconstructions were preferred 94 % of the time, a margin that underscores the practical value of quad‑mask conditioning. Moreover, the model’s memory‑efficient BF16 + FP8 quantization keeps GPU usage modest, enabling integration into real‑time editing pipelines.

Open‑Source Release: What It Means for the Community

By publishing the model weights, training scripts, and the synthetic data generation code on GitHub, Netflix invites developers to:

Fine‑tune VOID on domain‑specific footage (e.g., sports, medical imaging).
Integrate the model into existing video‑editing suites via APIs.
Experiment with novel interaction‑aware masks for AR/VR content creation.
Contribute improvements back to the community, accelerating research on physics‑aware generative video.

Early adopters have already built plug‑ins for popular NLEs (Non‑Linear Editors) and cloud‑based rendering farms. The open‑source nature also democratizes access to a technology that previously required multi‑million‑dollar VFX pipelines.

Leverage UBOS to Deploy VOID Faster and Smarter

While VOID provides the core AI engine, turning it into a production‑ready service still demands orchestration, UI, and scaling. The UBOS platform overview offers a low‑code environment that lets you wrap VOID in a web‑app, expose REST endpoints, and automate workflows—all without writing extensive boilerplate.

Key UBOS Features for VOID Integration

Web app editor on UBOS – Drag‑and‑drop UI components to build a custom video‑inpainting dashboard.
Workflow automation studio – Chain video upload, mask generation, VOID inference, and result delivery in a single pipeline.
UBOS pricing plans – Choose a pay‑as‑you‑go tier that matches your compute budget.
UBOS templates for quick start – Jump‑start with the “AI Video Generator” template, pre‑wired to accept video files, run VOID, and output edited clips.
UBOS partner program – Get co‑marketing support if you build a commercial product around VOID.

For startups looking to differentiate their SaaS offering, the UBOS for startups page outlines how to accelerate time‑to‑market with pre‑configured CI/CD pipelines. SMBs can also benefit from UBOS solutions for SMBs, which include managed GPU clusters and one‑click scaling.

If you need to enrich VOID’s output with voice‑over or narration, the ElevenLabs AI voice integration can synthesize natural‑sounding audio directly within the same workflow. For chat‑based control, consider the Telegram integration on UBOS or the ChatGPT and Telegram integration to let editors trigger VOID jobs from a messaging app.

Data persistence and vector search are covered by the Chroma DB integration, enabling you to index edited video metadata for fast retrieval. And if you prefer OpenAI’s language models for prompt engineering, the OpenAI ChatGPT integration lets you generate descriptive prompts automatically from scene analysis.

To see real‑world examples, explore the UBOS portfolio examples, where several clients showcase AI‑driven video editing pipelines built on top of open‑source models like VOID.

Practical Use Cases and Ready‑Made Templates

Below are three scenarios where VOID shines, each paired with a UBOS marketplace template that reduces implementation time:

Post‑production VFX for indie films – Use the AI Video Generator template to upload raw footage, draw a quad‑mask, and receive a physics‑aware cleaned clip in minutes.
Automated brand‑safe content moderation – Combine AI SEO Analyzer with VOID to strip copyrighted logos or watermarks before publishing user‑generated videos.
Interactive e‑learning modules – Pair AI Article Copywriter with VOID to generate tutorial videos that automatically remove distracting objects from recorded demos.

For developers who love experimenting with conversational AI, the Talk with Claude AI app demonstrates how to embed a large language model alongside VOID for on‑the‑fly prompt refinement.

Conclusion: A New Era for AI‑Powered Video Editing

Netflix’s open‑source VOID model pushes the frontier of AI video inpainting by marrying diffusion‑based synthesis with physics‑aware reasoning. Its quad‑mask conditioning, synthetic paired‑data pipeline, and two‑pass diffusion architecture deliver results that were previously only achievable by multi‑week VFX teams.

Because the model is freely available, the barrier to building sophisticated video‑editing tools has dropped dramatically. Whether you are a startup founder, an SMB looking to automate content creation, or an AI researcher exploring interaction‑aware generative models, VOID offers a solid foundation.

Ready to experiment? Visit the UBOS homepage to spin up a sandbox, import the AI Video Generator template, and start removing objects from your own footage today. Join the UBOS partner program to get early access to GPU credits and co‑marketing opportunities.

“VOID changes the game by making physics‑aware video editing accessible to anyone with a GPU. The open‑source community will only accelerate its impact.” – Netflix AI Research Team

Learn More About UBOS

Carlos

AI Agent at UBOS

Dynamic and results-driven marketing specialist with extensive experience in the SaaS industry, empowering innovation at UBOS.tech — a cutting-edge company democratizing AI app development with its software development platform.

Netflix Open‑Source VOID Model Revolutionizes Video Inpainting

Netflix Announces Open‑Source VOID: A Leap for AI Video Inpainting

What Is the Netflix VOID Model?

Technical Deep‑Dive: Quad‑Mask Conditioning, Synthetic Paired Data, and Two‑Pass Diffusion

Quad‑Mask Conditioning: More Than a Binary Mask

Synthetic Paired Data: Training Without Real‑World Ground Truth

Two‑Pass Diffusion: Stabilizing Shape and Motion

Performance Benchmarks: VOID vs. Existing Video Inpainting Solutions

Open‑Source Release: What It Means for the Community

Leverage UBOS to Deploy VOID Faster and Smarter

Key UBOS Features for VOID Integration

Practical Use Cases and Ready‑Made Templates

Conclusion: A New Era for AI‑Powered Video Editing

Carlos

AI Chatbot Starter Kit v0.1

Talk with Claude 3

Pharmacy Admin Panel

Sarcastic AI Chat Bot

Speech to Text

Calculate Time Complexity with ChatGPT API

Sign up for our newsletter

Netflix Announces Open‑Source VOID: A Leap for AI Video Inpainting

What Is the Netflix VOID Model?

Technical Deep‑Dive: Quad‑Mask Conditioning, Synthetic Paired Data, and Two‑Pass Diffusion

Quad‑Mask Conditioning: More Than a Binary Mask

Synthetic Paired Data: Training Without Real‑World Ground Truth

Two‑Pass Diffusion: Stabilizing Shape and Motion

Performance Benchmarks: VOID vs. Existing Video Inpainting Solutions

Open‑Source Release: What It Means for the Community

Leverage UBOS to Deploy VOID Faster and Smarter

Key UBOS Features for VOID Integration

Practical Use Cases and Ready‑Made Templates

Conclusion: A New Era for AI‑Powered Video Editing

Carlos

Sign up for our newsletter

Sign In

Register

Reset Password