Wan: Open and Advanced Large-Scale Video Generative Models
Wan: Open and Advanced Large-Scale Video Generative Models In this repository, we present Wan2.1, a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. Wan2.1 offers these key features:
DepthAnything/Video-Depth-Anything - GitHub
This work presents Video Depth Anything based on Depth Anything V2, which can be applied to arbitrarily long videos without compromising quality, consistency, or generalization ability. Compared with other diffusion-based models, it enjoys faster inference speed, fewer parameters, and higher ...
HunyuanVideo: A Systematic Framework For Large Video ... - GitHub
HunyuanVideo introduces the Transformer design and employs a Full Attention mechanism for unified image and video generation. Specifically, we use a "Dual-stream to Single-stream" hybrid model design for video generation. In the dual-stream phase, video and text tokens are processed independently through multiple Transformer blocks, enabling each modality to learn its own appropriate ...
Video-R1: Reinforcing Video Reasoning in MLLMs - GitHub
Video-R1 significantly outperforms previous models across most benchmarks. Notably, on VSI-Bench, which focuses on spatial reasoning in videos, Video-R1-7B achieves a new state-of-the-art accuracy of 35.8%, surpassing GPT-4o, a proprietary model, while using only 32 frames and 7B parameters. This highlights the necessity of explicit reasoning capability in solving video tasks, and confirms the ...
GitHub - Lightricks/LTX-Video: Official repository for LTX-Video
Official repository for LTX-Video. Contribute to Lightricks/LTX-Video development by creating an account on GitHub.
GitHub - Lightricks/ComfyUI-LTXVideo: LTX-Video Support for ComfyUI
LTX-Video Support for ComfyUI. Contribute to Lightricks/ComfyUI-LTXVideo development by creating an account on GitHub.
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video ...
Multimodal Video customization HunyuanCustom supports inputs in the form of text, images, audio, and video. Specifically, it can handle single or multiple image inputs to enable customized video generation for one or more subjects. Additionally, it can incorporate extra audio inputs to drive the subject to speak the corresponding audio.
GitHub - k4yt3x/video2x: A machine learning-based video super ...
A machine learning-based video super resolution and frame interpolation framework. Est. Hack the Valley II, 2018. - k4yt3x/video2x
GitHub - kijai/ComfyUI-WanVideoWrapper
Contribute to kijai/ComfyUI-WanVideoWrapper development by creating an account on GitHub.
【EMNLP 2024 】Video-LLaVA: Learning United Visual ... - GitHub
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection If you like our project, please give us a star ⭐ on GitHub for latest update. 💡 I also have other video-language projects that may interest you . Open-Sora Plan: Open-Source Large Video Generation Model
|