话题

视频生成

共 81 条相关资讯 · 来自历史归档

论文研究NEW

7/20 04:00

ShotPlan: Cinematic Video Generation with Learnable Planning Token

Current video generation models achieve impressive results in single-shot generation, yet remain limited in cinematic video generation, where coherent narratives and effective multi-shot composition r…

来源：HuggingFace Papers

论文研究NEW

7/20 04:00

HOMIE: Human-object Centric Video Personalization via Multimodal Intelligent Enchancement

Human-object centric video personalization (HOCVP) is a core task within subject-driven video generation. However, existing methods suffer from two key limitations. First, most approaches focusing on…

来源：HuggingFace Papers

论文研究NEW

7/20 04:00

FlashRT: Agent Harness for Guiding Agents to Deploy Real-Time Multimodal Applications

Real-time multimodal applications, including voice agents and interactive video generation, compose heterogeneous models into pipelines whose efficient deployment requires application-specific decisio…

来源：HuggingFace Papers

模型发布/更新

7/17 09:05

独家 | 智谱 ARR 达到10 亿美元，半年增长 15 倍

文 | 周鑫雨编辑 | 张雨忻智能涌现从多个独立信源处获悉，截至 2026 年 7 月，智谱的 ARR（年度经常性收入）已经达到 10 亿美元。截至发稿前，针对上述信息，智谱未回复。过去一年，AI Coding 和视频生成模型已经成为全球造血能力最强的 AI 赛道。海外，Anthropic 的 Claude Code 仅发布半年，ARR 就飙…

AI 点评 · 智谱ARR半年飙升15倍达10亿美元，印证AI商业化进入爆发期，行业格局加速重塑。

来源：36氪

论文研究NEW

7/17 04:00

Apple-π: Benchmarking Thinking with Video Towards Law-Grounded Physical Intelligence

Modern video generation models are increasingly hailed as emerging world models with an internalized grasp of physical law. Yet existing benchmarks largely evaluate physical plausibility only at the o…

来源：HuggingFace Papers

论文研究

7/15 04:00

From Pixels to States: Rethinking Interactive World Models as Game Engines

Building interactive worlds that respond coherently to player actions has long been a shared goal of computer graphics, games, and artificial intelligence. Recent video generative models provide a dat…

来源：HuggingFace Papers

论文研究

7/15 04:00

MultiRef-Compass: Towards Comprehensive Evaluation of Multi-Reference-to-Audio-Video Generation

Multi-reference-to-audio-video (MR2AV) generation aims to generate coherent audio-video content conditioned on multiple references and textual instructions. Existing benchmarks mainly focus on text-dr…

来源：HuggingFace Papers

论文研究

7/15 04:00

KeyFrame-Compass: Towards Comprehensive Evaluation of Keyframe-Conditioned Video Generation

Video generation increasingly relies on keyframe-based workflows, where creators specify a sequence of reference images to guide generation. Although recent models support multi-keyframe conditioning,…

来源：HuggingFace Papers

论文研究NEW

7/15 04:00

VideoRAE: Taming Video Foundation Models for Generative Modeling via Representation Autoencoders

Video generative models commonly rely on latent spaces learned by 3D Variational Autoencoders (3D-VAEs). However, conventional 3D-VAEs are mainly optimized for pixel-level reconstruction, which can li…

来源：HuggingFace Papers

论文研究

7/13 18:57

Xiaomi-Robotics-U0: Unified Embodied Synthesis with World Foundation Model

Recent foundation image and video generation models offer strong generalization and controllability, but their direct application to embodied scenarios is limited by requirements for multi-view consis…

来源：HuggingFace Papers

论文研究

7/10 04:00

Video Generation Models are General-Purpose Vision Learners

Driven by next-token prediction, NLP shifted from task-specific models into powerful generalist foundation models. What, then, is the equivalent catalyst needed to achieve a general-purpose model in c…

AI 点评 · 视频生成模型突破任务局限，迈向通用视觉，预示AI基础模型新范式。

来源：HuggingFace Papers

产品发布/更新

7/9 13:47

刚刚，全球首个具身专属的MoE视频模型，开源了！

视频生成的下一站，或是机器人大脑

来源：量子位

论文研究

7/9 04:00

OpenCoF: Learning to Reason Through Video Generation

Reasoning has become a core capability for large models, especially when reliable decisions require understanding logical consequences. Recent video generation models offer a reasoning path distinct f…

来源：HuggingFace Papers

论文研究

7/9 04:00

OPSD-V: On-Policy Self-Distillation for Post-Training Few-Step Autoregressive Video Generators

We propose OPSD-V, an on-policy self-distillation paradigm for post-training few-step autoregressive (AR) video diffusion models. Existing few-step AR video generators can produce long videos with low…

来源：HuggingFace Papers

行业动态

7/8 11:03

AI 视频全面成熟：Seedance 2.0 领衔，6 款主流工具深度横评

2026年，AI视频生成赛道已迈入全面爆发的成熟竞速期，Seedance 2.0 的出圈更是让AI视频变成一场“全民狂欢”。短短两年间，AI视频从最初几秒的碎片化模糊画面，到如今分钟级长视频的连贯叙事、真实物理世界的精准还原，AI视频工具的迭代速度远超预期，AI视频工具完成了从“能用”到“好用”再到“专业”的三级跳，让创意落地的门槛降至新低——专业团队能用…

来源：36氪

论文研究

7/8 04:00

Scaling Mixture-of-Experts Video Pretraining for Embodied Intelligence

Despite the recent promise in robot control, video generative models suffer from a domain mismatch due to their primary focus on content creation. For example, their design inherently prioritizes visu…

来源：HuggingFace Papers

行业动态

7/7 05:09

AI: The ROI Runway Could Be Long Outside the Tech Sector

来源：Hacker News

论文研究

7/7 04:00

RoboTALES: Learning Reasoning-Guided Robot Policies via Task-Aligned Simulated Futures

Pretrained video generative models are promising backbones for visuomotor control, but their imagined futures often drift from task intent and are not reliably action-conditional. As a result, these m…

来源：HuggingFace Papers

论文研究

7/6 04:00

MV-Forcing: Long Multi-View Video Generation via 4D-Grounded Spatio-Temporal Self-Forcing

Recent advances in video diffusion models have enabled either long single-view generation through temporal autoregression, or short multi-view synthesis through bidirectional attention. However, gener…

来源：HuggingFace Papers

产品发布/更新

7/3 23:27

消息称字节豆包视频生成模型 Seedance 2.5 预计 7 月 6 日上线体验中心，一周后开放 API

IT之家 7 月 3 日消息，据 AI 普瑞斯消息，字节豆包视频生成模型 Seedance 2.5 预计 7 月 6 日上线体验中心，将在一周后开放 API 。据IT之家此前报道，字节豆包视频生成模型 Seedance 2.5 发布于 6 月 23 日，该模型目前处于全球企业内测阶段。据介绍，Seedance 2.5 在单段生成长度、多素材参考、视频编…

AI 点评 · 字节视频模型快速从内测走向开放，行业落地节奏加快，值得关注其能力上限。

来源：IT之家

论文研究

7/3 04:00

Flex-Forcing: Towards a Unified Autoregressive and Bidirectional Video Diffusion Model

Recent progress in large-scale generative models has substantially advanced video generation, yet existing methods remain constrained by a rigid inference paradigm. Bidirectional diffusion models exce…

来源：HuggingFace Papers

论文研究

7/3 04:00

Vidu S1: A Real-Time Interactive Video Generation Model

We introduce Vidu S1, a real-time interactive video generation model supporting voice control of digital characters. Users can control video generation content at any moment through voice instructions…

来源：HuggingFace Papers

论文研究

7/3 01:27

OrbitQuant: Data-Agnostic Quantization for Image and Video Diffusion Transformers

Diffusion transformers (DiTs) achieve state-of-the-art image and video generation, but their multi-step sampling and growing parameter count make inference expensive. Post-training quantization (PTQ)…

来源：arXiv

论文研究

7/2 04:00

OrbitQuant: Data-Agnostic Quantization for Image and Video Diffusion Transformers

来源：HuggingFace Papers

行业动态

6/30 11:24

万亿市场格局未定：「端侧原生」，一家中国AI公司给物理AI抛了个新解法

过去几年，AI的战场在屏幕里。GPT系列用参数堆出了惊人的语言能力，Sora用视频生成震撼了全世界……但2026年，产业界达成了一组共识：2026年，是物理AI的元年。年初拉斯维加斯CES上，英伟达CEO黄仁勋用一场演讲，17遍提及物理AI，用以宣布“物理AI的ChatGPT时刻已经来了”。这也是他近两年一直推崇备至的关键词。而在过去的2年多时间里，物理A…

来源：36氪

论文研究

6/30 04:00

MemLearner: Learning to Query Context memory for Video World Models

Video World Models are interactive video generation models that predict future world states based on user actions and history video frames. A critical challenge in video world models is the lack of me…

来源：HuggingFace Papers

行业动态

6/29 16:04

独家｜获超亿美元融资，Sand.ai 曹越：为什么视频是通往世界模型最重要的路径

“每一代模型，我们都在押注一个非共识。” 文｜邓咏仪编辑｜张雨忻 Sand.ai 创始人曹越，不太关心自己站在共识的哪一边。 Sand.ai 是一家视频生成模型和产品公司，成立于2024年1月。曹越创立Sand.ai 的故事也已经被讲过很多遍：在上一段创业“光年之外”戛然而止后，曹越很快就投入到 Sand.ai 的创业中，做视频生成模型。彼时，市场的主流…

来源：36氪

论文研究

6/29 04:00

Walking in the Implicit: Interactive World Exploration via Neural Scene Representation

Interactive video generation systems for camera-controlled world exploration roll out growing sequences of latent video frames, entangling state transition with high-frequency observation synthesis. W…

来源：HuggingFace Papers

论文研究

6/29 04:00

AVTok: 1D Unified Tokenization for Holistic Audio-Video Generation

Audio-video generation has recently gained unprecedented research attention, aiming to synthesize high-quality sounding video content with fine-grained synchronization and semantic alignment between t…

来源：HuggingFace Papers

论文研究

6/26 04:00

PhysisForcing: Physics Reinforced World Simulator for Robotic Manipulation

Video generation models have emerged as a promising paradigm for embodied world simulation. However, both general-domain video generators and robot-specific data fine-tuned models can still produce ph…

来源：HuggingFace Papers

论文研究

6/25 04:00

MemoBench: Benchmarking World Modeling in Dynamically Changing Environments

Video generation models aspire to simulate dynamic environments, and several benchmarks now evaluate memory consistency across frames. However, most assess consistency only while the target remains in…

来源：HuggingFace Papers

论文研究

6/24 04:00

Physics Question Scene Graph: Fine-grained Evaluation of Physical Plausibility in Text-to-Video Generation

Video generation models are increasingly capable of producing realistic videos, but they still struggle to generate videos that follow basic physical laws. Compounding this is a lack of reliable granu…

来源：HuggingFace Papers

论文研究

6/24 04:00

MVTrack4Gen: Multi-View Point Tracking as Geometric Supervision for 4D Video Generation

Synthesizing a novel-view video from a monocular reference video along a target camera trajectory requires both geometric consistency and motion fidelity with respect to the reference video. Existing…

来源：HuggingFace Papers

论文研究

6/24 04:00

DomainShuttle: Freeform Open Domain Subject-driven Text-to-video Generation

Open domain subject-driven text-to-video (S2V) generation has drawn significant interest in academia and industry. Open domain S2V mainly involves two scenarios: in-domain, which requires retaining th…

来源：HuggingFace Papers

论文研究

6/24 04:00

Causal-rCM: A Unified Teacher-Forcing and Self-Forcing Open Recipe for Autoregressive Diffusion Distillation in Streaming Video Generation and Interactive World Models

Autoregressive video diffusion with causal diffusion transformers has emerged as a major paradigm for real-time streaming video generation and action-conditioned interactive world models. In this work…

来源：HuggingFace Papers

产品发布/更新

6/22 15:30

阿里发布视频生成模型HappyHorse 1.1：五大维度全面升级

来源：量子位

论文研究

6/22 04:00

Vera: A Layered Diffusion Model for Content-Preserving Video Editing

Video diffusion models have enabled remarkable progress in video generation and editing. However, content preservation remains a core challenge: existing methods regenerate every pixel and often alter…

来源：HuggingFace Papers

产品发布/更新

6/20 18:42

撸猫撸出SOTA！3个00后2个月，造出史上最快流式音视频社交模型

速度快7倍，成本只有Veo 3的1/2000

AI 点评 · 00后挑战行业巨头，用低成本实现7倍速度突破，颠覆音视频模型效率认知。

来源：量子位

论文研究

6/19 04:00

UnityShots: Memory-Driven Multi-Shot Audio-Video Generation with Boundary-Aware Gating

Generating a coherent multi-shot video requires structured cross-shot memory. Subject appearance, scene context, and speaker identity must persist across cuts. Existing approaches either train end-to-…

来源：HuggingFace Papers

论文研究

6/18 04:00

World Action Models: A Survey

World Action Models (WAMs) are embodied predictive-action models that make a forecast of the future available to action. Recent WAMs repurpose large video generation models, and a parallel line relies…

来源：HuggingFace Papers

论文研究

6/17 04:00

Physics-IQ Verified

Video generative models ( VGMs) have become a new frontier that can be used not just for video generation but for a multitude of downstream tasks, including world modeling. To advance these tasks, a g…

来源：HuggingFace Papers

论文研究

6/17 04:00

ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?

World Action Models (WAMs) commonly rely on video generation to bridge visual world modeling and robot control. However, video-based WAMs face three coupled limitations: dense multi-frame future token…

来源：HuggingFace Papers

论文研究

6/17 04:00

LooseControlVideo: Directorial Video Control using Spatial Blocking

Precise 3D spatial orchestration in text-to-video generation remains a significant challenge, particularly for multi-object scenes where semantic layout and temporal dynamics are often entangled. Whil…

来源：HuggingFace Papers

论文研究

6/17 04:00

TurboServe: Serving Streaming Video Generation Efficiently and Economically

Streaming video generation is emerging as a new serving workload in which users interact with long-lived sessions that generate video progressively, chunk by chunk. Unlike offline video generation or…

来源：HuggingFace Papers

产品发布/更新

6/16 12:36

LichAmnesia/awesome-ad-video-prompts

Curated, original high-craft prompts for AI video ads (Seedance 2.0 / Veo 3 / Kling / Runway). Companion to HeyDreaming.

来源：GitHub

论文研究

6/16 04:00

EgoCS-400K: An Egocentric Gameplay Dataset for World Models

The shift from video generation to interactive world modeling places new demands on data: beyond captioned videos, world models require temporally aligned video-action-language trajectories grounded i…

来源：HuggingFace Papers

论文研究

6/16 04:00

MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model

As an increasing majority of global video content is consumed on social platforms for interactive social purposes, video generation models built for social worlds are important but largely overlooked…

来源：HuggingFace Papers

论文研究

6/15 04:00

PermaVid: Consistent Video Generation Across Edits via Disentangled Context Memory

Consistent video generation under editing operations requires persistence: when edits modify scene appearance or layout, subsequent generations should remain coherent across time and viewpoints. Howev…

来源：HuggingFace Papers

论文研究

6/15 04:00

Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation

We introduce Qwen-RobotWorld, a language-conditioned video world model for embodied intelligence. With natural language as a unified action interface, it predicts physically grounded future visual tra…

来源：HuggingFace Papers

论文研究

6/14 04:00

Track2View: 4D-Consistent Camera-Controlled Video Generation via Paired 3D Point Tracks

Re-rendering an existing video from a novel camera viewpoint requires the output to follow the prescribed camera trajectory while preserving the appearance and dynamics of the original scene across ev…

来源：HuggingFace Papers

论文研究

6/12 04:00

Memento: Reconstruct to Remember for Consistent Long Video Generation

Long-form video generation requires recurring subjects to remain consistent across various shots, viewpoints, motions, and scene transitions. Existing temporal decomposition methods improve scalabilit…

来源：HuggingFace Papers

论文研究

6/11 04:00

OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired Data

Cloning camera motion from reference videos is an important task in video generation, as videos provide intuitive and precise control. Existing methods either directly use parametric representations t…

来源：HuggingFace Papers

论文研究

6/11 04:00

Avatar V: Scaling Video-Reference Avatar Video Generation

Generating avatar videos that are not merely visually similar to a target individual but behaviorally recognizable, faithfully reproducing their talking rhythm, gestural tendencies, and expression dyn…

来源：HuggingFace Papers

论文研究

6/10 04:00

World Model Self-Distillation: Training World Models to Solve General Tasks

Pretrained video generators are promising visual world models that exhibit emergent task-solving abilities; however, their reliance on detailed textual descriptions limits their direct use for plannin…

来源：HuggingFace Papers

论文研究

6/9 04:00

Next Forcing: Causal World Modeling with Multi-Chunk Prediction

Autoregressive video generation has emerged as a powerful paradigm for World Action Models (WAMs). However, existing approaches suffer from slow training convergence and limited converged accuracy, pa…

来源：HuggingFace Papers

论文研究

6/9 04:00

FadeMem: Distance-Aware Memory Consolidation for Autoregressive Video Diffusion

Autoregressive video generators synthesize long videos by generating successive temporal segments, but their historical KV cache grows with video length. Existing bounded-cache methods reduce this cos…

来源：HuggingFace Papers

论文研究

6/9 04:00

Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models

Recent work has demonstrated that online reinforcement learning (RL) can substantially improve the quality and alignment of flow matching models for image and video generation. Methods such as Flow-GR…

来源：HuggingFace Papers

论文研究

6/8 04:00

MilliVid: Hierarchical Latents for Long-Range Consistency in Video Generation

Video generative models have become increasingly powerful, but long-range consistency remains challenging to achieve because even a few dozen frames require impractically long transformer sequence len…

来源：HuggingFace Papers

产品发布/更新

6/7 04:51

fjordravenrecoup/RunwayML-Premium-Windows

Powerful AI magic tools for video editing, text-to-video generation, and VFX.

AI 点评 · 首个为Windows优化的RunwayML套件，让视频AI工具脱离云端限制，本地高效运行。

来源：GitHub

论文研究

6/5 04:00

Streaming Video Generation with Streaming Force Control

We introduce StreamForce, a streaming video generation framework that enables physically grounded control through continuous force inputs. Unlike prior video models that train separate models for diff…

来源：HuggingFace Papers

论文研究

6/4 04:00

Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

Video generation models have made impressive strides in synthesizing visually compelling content, yet their outputs remain confined to the virtual domain. A natural question follows: how well do these…

来源：HuggingFace Papers

论文研究

6/4 04:00

LoomVideo: Unifying Multimodal Inputs into Video Generation and Editing

Developing unified video generation and editing models capable of interpreting interleaved multimodal inputs is a promising yet challenging frontier field. Existing unified frameworks predominantly re…

来源：HuggingFace Papers

论文研究

6/4 04:00

RhymeFlow: Training-Free Acceleration for Video Generation with Asynchronous Denoising Flow Scheduling

Video generation models based on Diffusion Transformers (DiTs) have achieved remarkable performance in video synthesis, yet they suffer from high inference latency and computational costs due to the q…

来源：HuggingFace Papers

论文研究

6/3 04:00

Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation

We present Echo Infinity, an autoregressive (AR) framework towards real-time infinite video generation that employs a learnable evolving memory to dynamically filter, abstract, and compress any-length…

来源：HuggingFace Papers

论文研究

6/2 04:00

AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation

We present AAD-1, an Asymmetric Adversarial Distillation framework for One-step autoregressive image-to-video generation. State-of-the-art methods adopt adversarial distillation but suffer from motion…

来源：HuggingFace Papers

模型发布/更新

6/1 13:32

LobsterAI上线图片视频大模型矩阵

36氪获悉，国内大厂首个开源龙虾类产品LobsterAI （网易有道龙虾）近日宣布上线图片生成与视频生成能力，并一次性接入包括Seedream、Seedance、HappyHorse、MiniMax-Hailuo在内的模型。

AI 点评 · 多模型矩阵整合，开源策略降低使用门槛，推动AI创作生态。

来源：36氪

论文研究

6/1 04:00

LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation

Autoregressive (AR) video diffusion enables variable-length synthesis, but long-horizon generation often suffers from accumulated errors and identity drift. For efficiency, existing methods commonly a…

AI 点评 · 提出通用检索增强框架，解决长视频生成的累积误差与身份漂移，兼顾效率与质量。

来源：HuggingFace Papers

论文研究

6/1 04:00

VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization

The recent "Reasoning with Video" paradigm utilizes Video Generation Models (VGMs) to generate temporally coherent visual trajectories to complete reasoning tasks. Although state-of-the-art VGMs excel…

AI 点评 · 自适应测试时优化让视觉语言模型成为视频推理的“好老师”，突破传统方法局限。

来源：HuggingFace Papers

产品发布/更新

5/31 11:28

qixinhu11/LongLive-RAG

Official Implementation of LongLive-RAG: A general retrieval-augmented framework for long video generation.

AI 点评 · 开源长视频RAG框架，突破生成时长限制，为AI视频创作提供新路径。

来源：GitHub

论文研究

5/30 01:56

TunerDiT: Training-free Progressive Steering of Diffusion Transformer for Multi-Event Video Generation

Text-to-video (T2V) generation faces challenging questions when generating videos with long horizons containing multiple events. Inspired by the intrinsics of the diffusion process, we probe video dif…

AI 点评 · 无需额外训练，即可精准控制多事件视频生成，大幅降低算力门槛，推动视频创作民主化。

来源：arXiv

论文研究

5/29 04:00

DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory

Recent advances in video generative models have promoted rapid progress in controllable world models. However, maintaining fine-grained spatio-temporal consistency under long-horizon reasoning remains…

来源：HuggingFace Papers

论文研究

5/28 04:00

YoCausal: How Far is Video Generation from World Model? A Causality Perspective

As video diffusion models (VDMs) advance toward world models, a key question arises: do they truly understand causality, or merely overfit to statistical temporal patterns? Existing benchmarks mostly…

AI 点评 · 从因果视角评估视频生成模型，揭示其是否真正理解物理世界规律，而非仅拟合统计模式。

来源：HuggingFace Papers

论文研究

5/28 04:00

minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models

Recent video diffusion foundation models have achieved remarkable progress in high-quality video generation, yet turning them into real-time interactive video world models remains challenging. Interac…

AI 点评 · 开源全栈框架实现实时交互视频世界模型，突破现有视频生成技术瓶颈。

来源：HuggingFace Papers

论文研究

5/28 04:00

AdaState: Self-Evolving Anchors for Streaming Video Generation

Autoregressive video diffusion models generate streaming video by producing frames sequentially, conditioning each chunk on previously generated content. These models are structurally anchored to the…

AI 点评 · 自进化锚点机制突破流式视频生成瓶颈，实现更连贯的无限长视频输出。

来源：HuggingFace Papers

论文研究

5/28 04:00

Native Audio-Visual Alignment for Generation

Joint audio-video generation aims to synthesize temporally synchronized and semantically coherent visual-acoustic content. However, existing open-source methods mainly rely on either dual-tower design…

AI 点评 · 多模态对齐技术突破，让音视频生成更同步自然，推动AI内容创作进入新阶段。

来源：HuggingFace Papers

产品发布/更新

5/28 00:37

Portalshoequip/Runway-Unlimited-Pro-Gen-3

Runway Unlimited Pro Gen-3 with unlimited credits, premium models, motion brush, lip sync, upscale, and the complete creative suite—top subscription tier fully…

来源：GitHub

论文研究

5/27 04:00

SmartDirector: Keyframe-Conditioned Cinematic Video Generation with Narrative Pacing Control

The narrative quality of a video fundamentally determines its perceptual value. Although existing video generation methods can produce visually appealing content, they predominantly rely on sparse con…

AI 点评 · 智能导演技术突破，首次实现关键帧驱动的叙事节奏控制，让AI视频生成更懂剧情。

来源：HuggingFace Papers

论文研究

5/27 04:00

Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models

Spatial intelligence requires visual representations that capture both semantic objects and geometric structure in the physical world. To support this, two major pre-training schemes are now widely us…

AI 点评 · 对比视觉语言与视频生成模型，揭示哪种预训练范式更利于空间智能发展。

来源：HuggingFace Papers

论文研究

5/25 04:00

StreamChar: Long-Horizon Streaming Character Audio-Video Generation with Decoupled Orchestration

Real-time streaming joint audio-video generation for character animation requires a generator to speak the requested transcript, maintain visual identity across chunks, and run within a strict playbac…

AI 点评 · 解耦式编排实现长时流式音视频生成，突破实时角色动画的连贯性与延迟瓶颈。

来源：HuggingFace Papers

论文研究

5/22 04:00

One-Forcing: Towards Stable One-Step Autoregressive Video Generation

Recent advances have substantially improved real-time interactive video generation in the autoregressive regime. However, most existing few-step autoregressive video generation methods, often distille…

AI 点评 · 提出单步自回归视频生成新范式，有望突破实时交互瓶颈，显著提升生成稳定性。

来源：HuggingFace Papers

技巧与观点

4/12 08:00

Diffusion Models for Video Generation

Diffusion models have demonstrated strong results on image synthesis in past years. Now the research community has started working on a harder task—using it for video generation. T…

AI 点评 · 视频生成新突破，扩散模型从图像迈向动态世界。

来源：Lilian Weng