No-code in the front, Python in the back. An open-source framework
This repo contains the code for 1D tokenizer and generator
Open source demo platform where you can easily showcase your AI models
A framework to enable multimodal models to operate a computer
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System
Skywork-R1V is an advanced multimodal AI model series
Witness the aha moment of VLM with less than $3
Extension of Google Research’s PaperBanana
The most powerful Android RPA agent framework
Weaving the Digital Agent Galaxy
Driving with Graph Visual Question Answering
Autoregressive Model Beats Diffusion
LISA: Reasoning Segmentation via Large Language Model
LTX-Video Support for ComfyUI
SAPIEN Manipulation Skill Framework
Reference PyTorch implementation and models for DINOv3
An open phone agent model & framework
Agent S: an open agentic framework that uses computers like a human
Director, Screenwriter, Producer, and Video Generator All-in-One
"VideoRAG: Chat with Your Videos
Generating Immersive, Explorable, and Interactive 3D Worlds
Unified Multimodal Understanding and Generation Models
Multimodal Agents as Smartphone Users, an LLM-based multimodal agent
Python inference and LoRA trainer package for the LTX-2 audio–video
Static Analyzer for Solidity