Showing 474 open source projects for "visual"

View related business solutions
  • Earn up to 15% annual interest with Nexo. Icon
    Earn up to 15% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • Earn up to 15% annual interest with Nexo. Icon
    Earn up to 15% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 1
    1D Visual Tokenization and Generation

    1D Visual Tokenization and Generation

    This repo contains the code for 1D tokenizer and generator

    The 1D Visual Tokenization and Generation project from ByteDance introduces a novel “one-dimensional” tokenizer designed for images: instead of representing images with large grids of 2D tokens (as in many prior generative/image-modeling systems), it compresses images into as few as 32 discrete tokens (or more, optionally) — thereby achieving a very compact, efficient representation that drastically speeds up generation and reconstruction while retaining strong fidelity.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    FaceFusion

    FaceFusion

    Industry leading face manipulation platform

    FaceFusion is an open-source face swapping and facial enhancement toolkit designed for high-quality video and image manipulation workflows. The project enables users to replace faces in images or videos while maintaining temporal consistency and visual realism. It integrates modern deep learning models for face detection, alignment, and blending to produce smoother results than traditional approaches. FaceFusion is built with a modular pipeline that allows users to customize processing steps and optimize performance for different hardware environments. The tool is often used in content creation, visual effects experimentation, and research into generative media. ...
    Downloads: 283 This Week
    Last Update:
    See Project
  • 3
    Ren'Py

    Ren'Py

    The Ren'Py Visual Novel Engine

    ...The engine handles essential visual novel conventions like save and load systems, rollback to previous text, scene transitions, and UI menus, so creators can focus on the story and player experience. Because it’s built on Python and widely supported across platforms, Ren’Py games can run on Windows, macOS, Linux, mobile devices, and even in browsers with HTML5 builds, helping developers reach a broad audience.
    Downloads: 84 This Week
    Last Update:
    See Project
  • 4
    DeepSeek-OCR 2

    DeepSeek-OCR 2

    Visual Causal Flow

    DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents with rich spatial structure. ...
    Downloads: 11 This Week
    Last Update:
    See Project
  • Globalscape Enhanced File Transfer (EFT) is a best-in-class managed file transfer (MFT) solution Icon
    Globalscape Enhanced File Transfer (EFT) is a best-in-class managed file transfer (MFT) solution

    For Windows-Centric Organizations Looking for Secure File Transfer solutions

    Globalscape’s Enhanced File Transfer (EFT) platform is a comprehensive, user-friendly managed file transfer (MFT) software. Thousands of Windows-Centric Organizations trust Globalscape EFT for their mission-critical file transfers.
    Learn More
  • 5
    RobotCode

    RobotCode

    RobotFramework support for Visual Studio Code

    An extension that brings support for RobotFramework to Visual Studio Code, including features like code completion, debugging, test explorer, refactoring and more! With RobotCode you can edit your code with auto-completion, code navigation, syntax checking and many more.
    Downloads: 19 This Week
    Last Update:
    See Project
  • 6
    Jaaz

    Jaaz

    Open source multimodal creative AI assistant with infinite canvas tool

    Jaaz is an open source multimodal creative assistant designed to help users generate and organize visual media using artificial intelligence. It functions as a creative workspace where images, videos, and visual storyboards can be produced and arranged on an infinite canvas environment. It combines AI agents with visual editing tools, allowing users to generate media through prompts, sketches, or simple instructions. Jaaz supports multiple AI models and can integrate both local and cloud-based inference systems, enabling flexible creative workflows. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    Flowsint

    Flowsint

    Graph-based OSINT investigation platform w visual relationship mapping

    Flowsint is an open source OSINT investigation platform designed to help analysts explore and understand relationships between digital entities through a visual graph interface. The platform focuses on reconnaissance and open source intelligence workflows, enabling investigators to map connections between domains, IP addresses, organizations, individuals, and other data points. By presenting these relationships in an interactive graph, Flowsint allows users to quickly identify patterns, associations, and investigative leads that might be difficult to detect through traditional data analysis methods. ...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 8
    Video-subtitle-remover (VSR)

    Video-subtitle-remover (VSR)

    AI tool that removes hardcoded subtitles and text from videos locally

    ...Video Subtitle Remover analyzes video frames and detects subtitle regions, then replaces the removed areas using an AI algorithm that fills the space with reconstructed visual content. This process aims to maintain the original resolution and visual continuity of the video after subtitle removal. It allows users to define a specific subtitle region so that only text in that area is removed rather than modifying the entire frame. It can also automatically remove text throughout the whole video when a position is not specified. ...
    Downloads: 84 This Week
    Last Update:
    See Project
  • 9
    DeepSeek VL

    DeepSeek VL

    Towards Real-World Vision-Language Understanding

    DeepSeek-VL is DeepSeek’s initial vision-language model that anchors their multimodal stack. It enables understanding and generation across visual and textual modalities—meaning it can process an image + a prompt, answer questions about images, caption, classify, or reason about visuals in context. The model is likely used internally as the visual encoder backbone for agent use cases, to ground perception in downstream tasks (e.g. answering questions about a screenshot). The repository includes model weights (or pointers to them), evaluation metrics on standard vision + language benchmarks, and configuration or architecture files. ...
    Downloads: 11 This Week
    Last Update:
    See Project
  • Contractor Foreman is the most affordable all-in-one construction management software for contractors and is trusted by contractors in more than 75 countries. Icon
    Contractor Foreman is the most affordable all-in-one construction management software for contractors and is trusted by contractors in more than 75 countries.

    For Residential, Commercial and Public Works Contractors

    Starting at $49/m for the WHOLE company, Contractor Foreman is the most affordable all-in-one construction management system for contractors. Our customers in 75+ countries and industry awards back it up. And it's all backed by a 100 day guarantee.
    Learn More
  • 10
    InternGPT

    InternGPT

    Open source demo platform where you can easily showcase your AI models

    InternGPT is an open-source multimodal AI framework designed to extend large language models beyond text interactions into visual reasoning and image manipulation tasks. The system integrates conversational AI with computer vision models so users can interact with images, videos, and visual environments through natural language instructions. Unlike traditional chat systems that rely solely on text prompts, InternGPT allows users to interact with visual content using both language and nonverbal signals such as pointing or highlighting objects within images. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    AIMr

    AIMr

    The best AI Aimbot for Fortnite, Valorant, CS2, R6, COD, Apex, & more

    ...The software includes various aiming enhancements, such as recoil control, silent aim, and prediction capabilities, aimed at making gameplay smoother and more competitive. AIMr also provides visual customization options like field-of-view displays and detection indicators, allowing players to tailor their interface. The system is compatible with games that use human-shaped models, and although it functions effectively out of the box, optimizing it with CUDA-accelerated OpenCV is recommended for maximum performance.
    Downloads: 324 This Week
    Last Update:
    See Project
  • 12
    InternLM-XComposer-2.5

    InternLM-XComposer-2.5

    InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System

    ...It incorporates visual understanding modules that allow the model to analyze images and integrate them into coherent narrative outputs. The framework also supports tasks such as image captioning, multimodal reasoning, and layout generation for structured visual documents. By combining language generation with visual composition capabilities, the system enables new forms of content creation that integrate written explanations with automatically generated visual components.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    InternVL

    InternVL

    A Pioneering Open-Source Alternative to GPT-4o

    InternVL is a large-scale multimodal foundation model designed to integrate computer vision and language understanding within a unified architecture. The project focuses on scaling vision models and aligning them with large language models so that they can perform tasks involving both visual and textual information. InternVL is trained on massive collections of image-text data, enabling it to learn representations that capture both visual patterns and semantic meaning. The model supports a wide variety of tasks, including visual perception, image classification, and cross-modal retrieval between images and text. It can also be connected to language models to enable conversational interfaces that understand images, videos, and other visual content. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    SAM 3

    SAM 3

    Code for running inference and finetuning with SAM 3 model

    SAM 3 (Segment Anything Model 3) is a unified foundation model for promptable segmentation in both images and videos, capable of detecting, segmenting, and tracking objects. It accepts both text prompts (open-vocabulary concepts like “red car” or “goalkeeper in white”) and visual prompts (points, boxes, masks) and returns high-quality masks, boxes, and scores for the requested concepts. Compared with SAM 2, SAM 3 introduces the ability to exhaustively segment all instances of an open-vocabulary concept specified by a short phrase or exemplars, scaling to a vastly larger set of categories than traditional closed-set models. ...
    Downloads: 46 This Week
    Last Update:
    See Project
  • 15
    graphify

    graphify

    AI coding assistant skill (Claude Code, Codex, OpenCode, OpenClaw)

    ...The architecture emphasizes flexibility, enabling users to customize how data is mapped and displayed. It may also include analytical features to explore patterns, clusters, or anomalies within the graph. Overall, graphify serves as a bridge between raw data and visual insight.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 16
    yt-dlp-gui

    yt-dlp-gui

    A cross-platform GUI wrapper for yt-dlp written in PySide6

    yt-dlp-gui is a cross-platform graphical interface for the popular command-line video downloader yt-dlp, created to make video and audio downloads from sites like YouTube, Vimeo, Twitch, and others easier for everyday users without needing to work directly with command-line arguments. Written in PySide6 (Python with Qt bindings), it wraps the powerful yt-dlp engine in a visual application that lets users paste video URLs, choose formats, apply presets, and start downloads with a click, while still exposing options for advanced tweaks via configuration files. The project supports preset definitions and global arguments through a config file, so users can customize their most common download workflows—like audio extraction, quality ranking, or embedding thumbnails—without retyping arguments each time. ...
    Downloads: 297 This Week
    Last Update:
    See Project
  • 17
    R1-V

    R1-V

    Witness the aha moment of VLM with less than $3

    R1-V is an initiative aimed at enhancing the generalization capabilities of Vision-Language Models (VLMs) through Reinforcement Learning in Visual Reasoning (RLVR). The project focuses on building a comprehensive framework that emphasizes algorithm enhancement, efficiency optimization, and task diversity to achieve general vision-language intelligence and visual/GUI agents. The team's long-term goal is to contribute impactful open-source research in this domain.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Skywork-R1V4

    Skywork-R1V4

    Skywork-R1V is an advanced multimodal AI model series

    Skywork-R1V is an open-source multimodal reasoning model designed to extend the capabilities of large language models into vision-language tasks that require complex logical reasoning. The project introduces a model architecture that transfers the reasoning abilities of advanced text-based models into visual domains so the system can interpret images and perform multi-step reasoning about them. Instead of retraining both language and vision models from scratch, the framework uses a lightweight visual projection layer that connects a pretrained vision backbone with a reasoning-capable language model. This design allows the model to analyze images while maintaining strong textual reasoning performance, enabling tasks such as solving visual math problems, interpreting scientific diagrams, and answering questions about images.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Book2_Beauty-of-Data-Visualization

    Book2_Beauty-of-Data-Visualization

    Machine Learning, Criticism and Correction

    Book2_Beauty-of-Data-Visualization is an open educational project that teaches the principles and techniques of effective data visualization using Python and modern plotting libraries. The repository focuses on both the technical and aesthetic aspects of visual analytics, helping learners understand how to communicate data clearly and persuasively. It includes practical examples that demonstrate how different chart types reveal patterns, trends, and distributions in real datasets. The material emphasizes visual storytelling and design thinking alongside coding implementation. By combining theory with hands-on plotting exercises, the book helps readers build both analytical and presentation skills. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    AstronRPA

    AstronRPA

    Agent-ready RPA suite with visual workflow automation tools engine

    Astron RPA is an enterprise-grade robotic process automation platform designed to help organizations and developers build automated workflows for desktop and web applications. It provides a visual workflow designer that supports low-code and no-code development, allowing users to create automation processes through a drag-and-drop interface instead of writing extensive code. It enables automation of common desktop software and browser-based tasks, making it suitable for repetitive business operations and system integrations. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 21
    Book4_Power-of-Matrix

    Book4_Power-of-Matrix

    Book_4_Matrix Power | The Iris Book: From Addition, Subtraction

    Book4_Power-of-Matrix is an open educational repository that forms part of the Visualize-ML book series, focusing on explaining matrix mathematics and linear algebra concepts through visual and intuitive methods. The project is designed to help readers progress from basic arithmetic toward machine learning fundamentals by building a strong conceptual understanding of vectors, matrices, and their operations. It combines explanatory text, diagrams, and Python examples to bridge theory and practical computation. The material emphasizes geometric interpretation and visual reasoning, which makes abstract linear algebra topics more accessible to beginners and self-learners. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22
    Clarity AI Upscaler

    Clarity AI Upscaler

    AI Image Upscaler & Enhancer

    Clarity AI Upscaler is an open-source AI image enhancement tool designed to increase the resolution and visual quality of images using modern generative techniques. The system uses deep learning models based on diffusion and other image generation methods to reconstruct high-resolution versions of low-resolution images while preserving important visual details. Unlike traditional interpolation-based upscaling algorithms, the system generates additional visual information that improves perceived clarity and sharpness. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 23
    CogVLM

    CogVLM

    A state-of-the-art open visual language model

    CogVLM is an open-source visual–language model suite—and its GUI-oriented sibling CogAgent—aimed at image understanding, grounding, and multi-turn dialogue, with optional agent actions on real UI screenshots. The flagship CogVLM-17B combines ~10B visual parameters with ~7B language parameters and supports 490×490 inputs; CogAgent-18B extends this to 1120×1120 and adds plan/next-action outputs plus grounded operation coordinates for GUI tasks.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    SeedVR2 Upscaler ComfyUI

    SeedVR2 Upscaler ComfyUI

    Official SeedVR2 Video Upscaler for ComfyUI

    ComfyUI-SeedVR2 Video Upscaler is an open-source integration node for the ComfyUI workflow environment that brings the advanced SeedVR2 video upscaling and restoration model directly into visual AI pipelines. This project packages the SeedVR2 architecture as a custom node for ComfyUI, letting users upscale low-resolution video or imagery inside a node-based interface without needing to write code manually. The underlying SeedVR2 model is known for delivering high-quality video enhancement with strong temporal consistency and improved detail preservation by using diffusion-based techniques that are trained specifically on video sequences. ...
    Downloads: 23 This Week
    Last Update:
    See Project
  • 25
    LTX-2.3

    LTX-2.3

    Official Python inference and LoRA trainer package

    ...Unlike most earlier video generation systems that only produced silent clips, LTX-2 combines video and audio generation in a unified architecture capable of producing coherent audiovisual scenes. The model uses a diffusion-transformer-based architecture designed to generate high-fidelity visual frames while simultaneously producing corresponding audio elements such as speech, music, ambient sound, or effects. This unified approach allows creators to generate complete multimedia sequences where motion, timing, and sound are aligned automatically. LTX-2 is designed for both research and production workflows and can generate high-resolution video clips with precise control over structure, motion, and camera behavior.
    Downloads: 177 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB