Open Source BSD Large Language Models (LLM) - Page 5

Sort By:

Large Language Models (LLM) for BSD

Large Language Models (LLM) BSD Clear Filters

Earn up to 15% annual interest with Nexo.
More flexibility. More control.

Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.

Get started with Nexo.
Earn up to 15% annual interest with Nexo.
Access competitive interest rates on your digital assets.

Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.

Get started with Nexo.
1

Claude Code Tools

Practical productivity tools for Claude Code, Codex-CLI

Claude Code Tools is an open-source collection of command-line utilities and productivity plugins designed to enhance developer workflows when using AI coding agents such as Claude Code and Codex-CLI. The project focuses on solving common problems encountered in AI-assisted development environments, including managing session history, automating terminal interactions, and maintaining context across multiple coding sessions. It includes tools that allow developers to search conversation logs quickly, manage environment variables securely, and execute interactive terminal workflows that AI agents can control. Some components enable Claude Code to interact with terminal multiplexers such as tmux so that it can run programs, debug applications, and interact with scripts that require user input. The toolkit also provides safety mechanisms that prevent potentially dangerous shell commands from being executed automatically by AI agents.

Downloads: 2 This Week

Last Update: 2026-03-26
See Project
2

DB-GPT-Hub

A repository that contains models, datasets, and fine-tuning

DB-GPT-Hub is an open-source repository that provides datasets, models, and training tools designed to improve large language models for database interaction tasks, particularly Text-to-SQL. The project serves as a specialized extension of the broader DB-GPT ecosystem, focusing on the preparation and evaluation of models capable of translating natural language questions into structured database queries. It offers a modular framework that supports data preparation, model fine-tuning, benchmarking, and inference for Text-to-SQL systems. The repository includes datasets and experiment configurations that allow researchers to train models on real database schemas and evaluate them using standardized benchmarks. Its design encourages experimentation with different large language models and fine-tuning techniques, including parameter-efficient training approaches.

Downloads: 2 This Week

Last Update: 2026-03-06
See Project
3

DocStrange

Extract and convert data from any document, images, pdfs, word doc

DocStrange is an open-source document understanding and extraction library designed to convert complex files into structured, LLM-ready outputs such as Markdown, JSON, CSV, and HTML. Developed by Nanonets, the project combines OCR, layout detection, table understanding, and structured extraction into one end-to-end pipeline, which reduces the need to stitch together multiple separate services. It is built for developers who need high-quality parsing from scans, photos, PDFs, office files, and other document sources while preserving privacy and control over the processing flow. One of its key differentiators is deployment flexibility: it offers a cloud API for managed usage as well as a fully private offline mode that runs locally on a GPU. The platform also supports synchronous extraction, streaming responses, and asynchronous processing for larger documents, which makes it adaptable to both interactive workflows and heavier back-end pipelines.

Downloads: 2 This Week

Last Update: 2026-03-09
See Project
4

FlagEmbedding

Retrieval and Retrieval-augmented LLMs

FlagEmbedding is an open-source toolkit for building and deploying high-performance text embedding models used in information retrieval and retrieval-augmented generation systems. The project is part of the BAAI FlagOpen ecosystem and focuses on creating embedding models that transform text into dense vector representations suitable for semantic search and large language model pipelines. FlagEmbedding includes a family of models known as BGE (BAAI General Embedding), which are designed to achieve strong performance across multilingual and cross-lingual retrieval benchmarks. The toolkit provides infrastructure for inference, fine-tuning, evaluation, and dataset preparation, enabling developers to train custom embedding models for specific domains or applications. It also includes reranker models that refine search results by re-evaluating candidate documents using cross-encoder architectures, improving retrieval accuracy in complex queries.

Downloads: 2 This Week

Last Update: 2026-03-04
See Project
Get your free 3CX license delivered to your inbox. Easy deployment and management; on premise or in the cloud, 3CX includes features such as: mobile apps, web conferencing, live chat, click2call and more, for UNLIMITED users.
Business as Usual During Covid-19

3CX is a software-based, open standards IP PBX that offers complete Unified Communications, out of the box. Suitable for any business size or industry 3CX can accommodate your every need; from mobility and status to advanced contact center features and more, at a fraction of the cost. 3CX makes installation, management and maintenance of your PBX so easy that you can effortlessly manage it yourself, whether on an appliance or server at your premise on Windows, Linux or in the cloud.

DOWNLOAD
5

Flock

Flock is a workflow-based low-code platform for building chatbots

Flock is a workflow-based low-code platform designed for building AI applications such as chatbots, retrieval-augmented generation systems, and multi-agent workflows. The platform uses a visual workflow architecture where different nodes represent processing steps such as input processing, model inference, retrieval operations, and tool execution. Developers can connect these nodes to create complex pipelines that orchestrate multiple language models and external services. Built on technologies such as LangChain, LangGraph, FastAPI, and Next.js, Flock combines a modern web interface with a flexible backend capable of supporting advanced AI workflows. The platform supports multi-agent collaboration, allowing developers to design workflows where different agents handle specialized tasks within the same system. Flock also includes features such as intent recognition, code execution nodes, and human-in-the-loop approval processes that make it suitable for production AI applications.

Downloads: 2 This Week

Last Update: 2026-03-09
See Project
6

Free LLM API resources

A list of free LLM inference resources accessible via API

Free LLM API resources repository curated by cheahjs is a community-driven index of free and open API endpoints, tools, datasets, runtimes, and utilities for working with large language models (LLMs) without cost-barriers. It collects a wide range of resources including hosted free-tier LLM APIs, documentation links, public model endpoints, open datasets useful for training or evaluation, tooling integrations, and examples showing how to interact with these services in real applications. This list helps developers, hobbyists, and researchers quickly find models they can use for prototyping, experimentation, or production proofs-of-concept without needing paid subscriptions, reducing friction for innovation. The repository typically categorizes offerings by provider, type of service (text, embeddings, vision), availability conditions (open without key, free tier with key), and usage examples to make discovery and adoption easier.

Downloads: 2 This Week

Last Update: 2026-03-08
See Project
7

GLM-130B

GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)

GLM-130B is an open bilingual (English and Chinese) dense language model with 130 billion parameters, released by the Tsinghua KEG Lab and collaborators as part of the General Language Model (GLM) series. It is designed for large-scale inference and supports both left-to-right generation and blank filling, making it versatile across NLP tasks. Trained on over 400 billion tokens (200B English, 200B Chinese), it achieves performance surpassing GPT-3 175B, OPT-175B, and BLOOM-176B on multiple benchmarks, while also showing significant improvements on Chinese datasets compared to other large models. The model supports efficient inference via INT8 and INT4 quantization, reducing hardware requirements from 8× A100 GPUs to as little as a single server with 4× RTX 3090s. Built on the SwissArmyTransformer (SAT) framework and compatible with DeepSpeed and FasterTransformer, it supports high-speed inference (up to 2.5× faster) and reproducible evaluation across 30+ benchmark tasks.

Downloads: 2 This Week

Last Update: 7 days ago
See Project
8

GPU Hot

Real-time NVIDIA GPU dashboard

GPU Hot is an open-source, lightweight monitoring dashboard designed to provide real-time visibility into NVIDIA GPU performance across single machines or entire clusters. The project offers a self-hosted web interface that streams hardware metrics directly from GPU servers, enabling developers, ML engineers, and system administrators to observe GPU utilization and system behavior in real time through a browser. The dashboard collects and displays a wide range of performance metrics including temperature, memory usage, power consumption, clock speeds, fan speed, and active processes. It can scale from monitoring a single GPU workstation to large distributed environments with dozens or even hundreds of GPUs by running lightweight containers on each node and aggregating the data centrally.

Downloads: 2 This Week

Last Update: 2026-04-11
See Project
9

GenAI Agents

Implementations for various Generative AI Agent techniques

GenAI Agents is a large, tutorial-driven repository that teaches you how to design, build, and experiment with generative AI agents. It spans a spectrum from simple conversational bots and basic question-answering agents to complex multi-agent systems that coordinate on research, education, business workflows, and creative tasks. The implementations leverage modern frameworks such as LangChain, LangGraph, AutoGen, PydanticAI, CrewAI, and more, showing how each can be wired into realistic agent workflows. The repo is structured by categories like beginner agents, framework tutorials, educational agents, business agents, creative agents, analysis agents, news bots, shopping assistants, task management agents, QA bots, and advanced systems such as controllable RAG agents. For each agent, you typically get an overview, implementation notes, and external resources (blog posts, videos, documentation) to deepen understanding.

Downloads: 2 This Week

Last Update: 2026-04-11
See Project
Creatio Low-Code Development Platform
Automate any business idea in minutes with Studio Creatio Enterprise

Intelligent low-code platform to empower both IT and non-IT staff to effortlessly build enterprise-grade apps and processes

Learn More
10

Grounded Docs

Open-Source Alternative to Context7, Nia, and Ref.Tools

Grounded Docs is an open-source implementation of a Model Context Protocol server designed to expose documentation and structured information as tools that AI agents can query. The project allows language models and agent frameworks to retrieve and interact with documentation through standardized MCP interfaces. By acting as an intermediary layer between documentation sources and AI tools, the server enables models to access structured documentation in a consistent and machine-readable format. This makes it easier for AI systems to answer technical questions, generate code examples, or retrieve reference material without requiring developers to manually integrate documentation into prompts. The architecture follows the MCP specification, which allows AI assistants and agent frameworks to connect to external tools through standardized protocols.

Downloads: 2 This Week

Last Update: 2026-03-30
See Project
11

In-The-Wild Jailbreak Prompts on LLMs

A dataset consists of 15,140 ChatGPT prompts from Reddit

In-The-Wild Jailbreak Prompts on LLMs is an open-source research repository that provides datasets and analytical tools for studying jailbreak prompts used to bypass safety restrictions in large language models. The project is part of a research effort to understand how users attempt to circumvent alignment and safety mechanisms built into modern AI systems. The repository includes a large collection of prompts gathered from real-world platforms such as Reddit, Discord, prompt-sharing communities, and other public sources. Researchers analyze these prompts to identify patterns, attack strategies, and techniques commonly used to trick language models into producing restricted or harmful outputs. The dataset includes thousands of prompts collected across multiple platforms and represents one of the largest collections of jailbreak attempts available for research.

Downloads: 2 This Week

Last Update: 2026-03-05
See Project
12

JamAI Base

The collaborative spreadsheet for AI

JamAI Base is an open-source backend platform designed to simplify the development of retrieval-augmented generation systems and AI-driven applications. The platform integrates both a relational database and a vector database into a single embedded architecture, allowing developers to store structured data alongside semantic embeddings. It includes built-in orchestration for large language models, vector search, and reranking pipelines so that AI applications can retrieve relevant information before generating responses. JamAI Base exposes its functionality through a simple REST API and a spreadsheet-style interface that allows users to manage AI workflows visually. One of the key ideas behind the platform is the concept of generative tables, which allow database columns to automatically populate with AI-generated content. The system also supports action tables and chat tables that simplify the creation of interactive AI features such as conversational interfaces and dynamic workflows.

Downloads: 2 This Week

Last Update: 2026-03-09
See Project
13

LLM CLI

Access large language models from the command-line

A CLI utility and Python library for interacting with Large Language Models, both via remote APIs and models that can be installed and run on your own machine.

Downloads: 2 This Week

Last Update: 2026-03-31
See Project
14

LLM Colosseum

Benchmark LLMs by fighting in Street Fighter 3

LLM-Colosseum is an experimental benchmarking framework designed to evaluate the capabilities of large language models through gameplay interactions rather than traditional text-based benchmarks. The system places language models inside the environment of the classic video game Street Fighter III, where they must interpret the game state and decide which actions to perform during combat. This setup creates a dynamic environment that tests reasoning, situational awareness, and decision-making abilities in real time. Instead of relying purely on reward signals as in reinforcement learning agents, the models analyze contextual information and generate strategic actions based on the game environment. Performance is evaluated using a competitive ranking system that assigns models an ELO rating based on their results across matches against other models.

Downloads: 2 This Week

Last Update: 2026-03-07
See Project
15

Nano-vLLM

A lightweight vLLM implementation built from scratch

Nano-vLLM is a lightweight implementation of the vLLM inference engine designed to run large language models efficiently while maintaining a minimal and readable codebase. The project recreates the core functionality of vLLM in a simplified architecture written in approximately a thousand lines of Python, making it easier for developers and researchers to understand how modern LLM inference systems work. Despite its compact design, nano-vllm incorporates advanced optimization techniques such as prefix caching, tensor parallelism, and CUDA graph execution to achieve high performance during model inference. The engine is intended primarily for educational use, experimentation, and lightweight deployments where a full production-grade inference stack may be unnecessary. Its API closely mirrors that of the original vLLM framework, allowing developers familiar with vLLM to adopt the tool with minimal changes.

Downloads: 2 This Week

Last Update: 6 days ago
See Project
16

OpenPlanter

Language-model investigation agent with a terminal UI

OpenPlanter is an open-source Python project focused on building an intelligent automated planting or gardening system powered by software control and data processing. The repository is designed to help developers and hobbyists create programmable plant management workflows that can monitor, schedule, and optimize growing conditions. It emphasizes automation and extensibility, allowing integration with sensors, environmental data, and control logic for smart cultivation setups. The system is structured to support experimentation and customization, making it suitable for both research and DIY agriculture projects. With its modular Python codebase, users can adapt the platform for different plant types, hardware setups, or automation strategies. Overall, OpenPlanter aims to simplify the creation of programmable, data-driven plant care systems.

Downloads: 2 This Week

Last Update: 2026-03-06
See Project
17

Parallax

Parallax is a distributed model serving framework

Parallax is a decentralized inference framework designed to run large language models across distributed computing resources. Instead of relying on centralized GPU clusters in data centers, the system allows multiple heterogeneous machines to collaborate in serving AI inference workloads. Parallax divides model layers across different nodes and dynamically coordinates them to form a complete inference pipeline. A two-stage scheduling architecture determines how model layers are allocated to available hardware and how requests are routed across nodes during execution. This scheduling system optimizes latency, throughput, and hardware utilization even when nodes have different computational capabilities. The platform also supports model sharding and pipeline parallelism, allowing very large models to run across distributed resources.

Downloads: 2 This Week

Last Update: 2026-03-09
See Project
18

Pixeltable

Data Infrastructure providing an approach to multimodal AI workloads

Pixeltable is an open-source Python data infrastructure framework designed to support the development of multimodal AI applications. The system provides a declarative interface for managing the entire lifecycle of AI data pipelines, including storage, transformation, indexing, retrieval, and orchestration of datasets. Unlike traditional architectures that require multiple tools such as databases, vector stores, and workflow orchestrators, Pixeltable unifies these functions within a table-based abstraction. Developers define data transformations and AI operations using computed columns on tables, allowing pipelines to evolve incrementally as new data or models are added. The framework supports multimodal content including images, video, text, and audio, enabling applications such as retrieval-augmented generation systems, semantic search, and multimedia analytics.

Downloads: 2 This Week

Last Update: 2026-04-11
See Project
19

Prometheus-Eval

Evaluate your LLM's response with Prometheus and GPT4

Prometheus-Eval is an open-source framework designed to evaluate the outputs of large language models using specialized evaluator models known as Prometheus. The project provides tools, datasets, and scripts that allow developers and researchers to measure the quality of LLM responses through automated scoring rather than relying solely on human evaluators. It implements an “LLM-as-a-judge” approach in which a dedicated language model analyzes instruction–response pairs and assigns scores or rankings based on predefined evaluation criteria. The repository includes a Python package that provides a straightforward interface for running evaluations and integrating them into model development pipelines. It also provides training data and utilities for fine-tuning evaluator models so they can assess outputs according to custom scoring rubrics such as helpfulness, accuracy, or style.

Downloads: 2 This Week

Last Update: 2026-03-09
See Project
20

Punica

Serving multiple LoRA finetuned LLM as one

Punica is a system designed to efficiently serve multiple LoRA-fine-tuned large language models within a shared GPU environment. LoRA is a parameter-efficient fine-tuning method that allows developers to adapt large pretrained models to specific tasks by adding lightweight adapter layers rather than retraining the entire model. Punica introduces a serving architecture that allows multiple LoRA adapters to share the same base model during inference, significantly reducing memory consumption and computational overhead. The system includes specialized CUDA kernels that enable batched GPU operations across different LoRA models simultaneously. This design allows a single GPU cluster to host many task-specific models while maintaining high throughput and minimal latency. The architecture also includes scheduling mechanisms that coordinate requests from multiple tenants and distribute workloads efficiently across available resources.

Downloads: 2 This Week

Last Update: 2026-03-09
See Project
21

Reader LLM

Convert any URL to an LLM-friendly input with a simple prefix

Reader LLM is an open-source tool designed to convert web content into formats that are easier for large language models to process. The system works by transforming a webpage into a clean text or Markdown representation that removes unnecessary formatting and highlights the core information within the page. Developers can use a simple URL prefix to retrieve a version of a webpage that has been optimized for machine consumption, making it suitable for use in AI agents or retrieval-augmented generation pipelines. In addition to converting individual pages, the service can perform web searches and return relevant content that can be ingested directly by AI systems. The tool relies on specialized models and parsing techniques to handle complex HTML structures and extract meaningful content while preserving important context.

Downloads: 2 This Week

Last Update: 3 days ago
See Project
22

SGR Agent Core

Schema-Guided Reasoning (SGR) has agentic system design

SGR Agent Core is an open-source framework for building intelligent AI research agents based on a methodology known as Schema-Guided Reasoning (SGR). The framework provides a core library that allows developers to design autonomous agents capable of structured reasoning and complex task execution. Instead of relying solely on free-form prompts, the system organizes reasoning processes around schemas that guide how agents analyze problems, gather information, and generate outputs. This architecture enables agents to follow structured reasoning workflows while still benefiting from the flexibility of large language models. The framework includes a BaseAgent interface and a two-phase architecture that separates reasoning planning from execution, allowing developers to implement custom agent behaviors and research pipelines.

Downloads: 2 This Week

Last Update: 2026-03-18
See Project
23

Secret Llama

Fully private LLM chatbot that runs entirely with a browser

Secret Llama is a privacy-first large-language-model chatbot that runs entirely inside your web browser, meaning no server is required and your conversation data never leaves your device. It focuses on open-source model support, letting you load families like Llama and Mistral directly in the client for fully local inference. Because everything happens in-browser, it can work offline once models are cached, which is helpful for air-gapped environments or travel. The interface mirrors the modern chat UX you’d expect—streaming responses, markdown, and a clean layout—so there’s no usability tradeoff to gain privacy. Under the hood it uses a web-native inference engine to accelerate model execution with GPU/WebGPU when available, keeping responses responsive even without a backend. It’s a great option for developers and teams who want to prototype assistants or handle sensitive text without sending prompts to external APIs.

Downloads: 2 This Week

Last Update: 2025-11-07
See Project
24

SimpleLLM

950 line, minimal, extensible LLM inference engine built from scratch

SimpleLLM is a minimal, extensible large language model inference engine implemented in roughly 950 lines of code, built from scratch to serve both as a learning tool and a research platform for novel inference techniques. It provides the core components of an LLM runtime—such as tokenization, batching, and asynchronous execution—without the abstraction overhead of more complex engines, making it easier for developers and researchers to understand and modify. Designed to run efficiently on high-end GPUs like NVIDIA H100 with support for models such as OpenAI/gpt-oss-120b, Simple-LLM implements continuous batching and event-driven inference loops to maximize hardware utilization and throughput. Its straightforward code structure allows anyone experimenting with custom kernels, new batching strategies, or inference optimizations to trace execution from input to output with minimal cognitive overhead.

Downloads: 2 This Week

Last Update: 2026-01-28
See Project
25

Streamer-Sales

LLM Large Model of Selling Anchor

Streamer-Sales is an open-source large language model system designed specifically for e-commerce live streaming and automated product promotion. The project focuses on generating persuasive product descriptions and live presentation scripts that mimic the style of professional online sales hosts. By analyzing product characteristics and marketing information, the model can produce engaging explanations that emphasize benefits, features, and emotional appeal to encourage viewers to make purchasing decisions. The system integrates multiple AI technologies including retrieval-augmented generation to incorporate product knowledge, speech synthesis to convert generated scripts into voice output, and digital human generation to create virtual hosts. It also supports automatic speech recognition and agent-based tools that can retrieve additional information such as logistics or product details during live sessions.

Downloads: 2 This Week

Last Update: 2026-03-05
See Project