This repo contains the code for 1D tokenizer and generator
Open source demo platform where you can easily showcase your AI models
A framework to enable multimodal models to operate a computer
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System
Skywork-R1V is an advanced multimodal AI model series
Witness the aha moment of VLM with less than $3
Driving with Graph Visual Question Answering
Weaving the Digital Agent Galaxy
An open phone agent model & framework
LTX-Video Support for ComfyUI
Autoregressive Model Beats Diffusion
LISA: Reasoning Segmentation via Large Language Model
Reference PyTorch implementation and models for DINOv3
The most powerful Android RPA agent framework
Extension of Google Research’s PaperBanana
Director, Screenwriter, Producer, and Video Generator All-in-One
Unified Multimodal Understanding and Generation Models
Agent S: an open agentic framework that uses computers like a human
Generating Immersive, Explorable, and Interactive 3D Worlds
"VideoRAG: Chat with Your Videos
SAPIEN Manipulation Skill Framework
Python inference and LoRA trainer package for the LTX-2 audio–video
Multimodal Agents as Smartphone Users, an LLM-based multimodal agent
Official implementation of Watermark Anything with Localized Messages
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning