Showing 1058 open source projects for "python data analysis"

View related business solutions
  • Earn up to 15% annual interest with Nexo. Icon
    Earn up to 15% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • Earn up to 15% annual interest with Nexo. Icon
    Earn up to 15% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 1
    Data Version Control

    Data Version Control

    Git-based data version control for machine learning workflows

    DVC (Data Version Control) is an open source tool designed to bring version control principles to machine learning and data science workflows. It enables developers and data scientists to track datasets, machine learning models, and experiment results in a way that integrates with existing Git repositories. Instead of storing large datasets directly in Git, DVC keeps lightweight metadata in the repository while storing the actual data in external storage systems. This approach allows teams...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    DATAGEN

    DATAGEN

    AI-driven multi-agent research assistant automating hypothesis

    DATAGEN is an AI-driven multi-agent research and data analysis platform designed to automate complex analytical workflows. The system coordinates multiple specialized AI agents that collaborate to perform tasks such as hypothesis generation, data collection, analysis, visualization, and report creation. Instead of requiring users to manually orchestrate each stage of a research process, the platform allows these agents to coordinate automatically and handle the workflow end-to-end. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    UCP Python SDK

    UCP Python SDK

    The official Python SDK for UCP

    ...This SDK provides Pydantic models for UCP schemas, making it easy for Python developers to construct, validate, and serialize protocol messages and data structures according to the UCP specification. By adhering to the official protocol standards, applications built on this SDK can participate in tasks like capability discovery, checkout flows, order management, and more, while remaining interoperable across different UCP implementations and surfaces.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Python Outlier Detection

    Python Outlier Detection

    A Python toolbox for scalable outlier detection

    PyOD is a comprehensive and scalable Python toolkit for detecting outlying objects in multivariate data. This exciting yet challenging field is commonly referred as outlier detection or anomaly detection. PyOD includes more than 30 detection algorithms, from classical LOF (SIGMOD 2000) to the latest COPOD (ICDM 2020) and SUOD (MLSys 2021). Since 2017, PyOD [AZNL19] has been successfully used in numerous academic researches and commercial products [AZHC+21, AZNHL19].
    Downloads: 9 This Week
    Last Update:
    See Project
  • Comet Backup - Fast, Secure Backup Software for MSPs Icon
    Comet Backup - Fast, Secure Backup Software for MSPs

    Fast, Secure Backup Software for Businesses and IT Providers

    Comet is a flexible backup platform, giving you total control over your backup environment and storage destinations.
    Learn More
  • 5
    Logfire MCP

    Logfire MCP

    The Logfire MCP Server is here

    The Logfire MCP Server is a Model Context Protocol server that allows AI applications to access OpenTelemetry traces and metrics sent to Logfire. It enables retrieval and analysis of telemetry data, enhancing debugging and observability workflows. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Clay Foundation Model

    Clay Foundation Model

    The Clay Foundation Model - An open source AI model and interface

    The Clay Foundation Model is an open-source AI model and interface designed to provide comprehensive data and insights about Earth. It aims to serve as a foundational tool for environmental monitoring, research, and decision-making by integrating various data sources and offering an accessible platform for analysis.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    cracking-the-data-science-interview

    cracking-the-data-science-interview

    A Collection of Cheatsheets, Books, Questions, and Portfolio

    Cracking the Data Science Interview is an open educational repository that collects study materials, resources, and reference links for preparing for data science interviews. The project organizes content across many fundamental areas of data science, including statistics, probability, SQL, machine learning, and deep learning. It includes cheat sheets that summarize important technical concepts commonly discussed during technical interviews. The repository also provides links to recommended...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    Quantitative Trading System

    Quantitative Trading System

    A comprehensive quantitative trading system with AI-powered analysis

    Quantitative Trading System is a comprehensive quantitative trading platform that integrates artificial intelligence, financial data analysis, and automated strategy execution within a unified software system. The project is designed to provide an end-to-end infrastructure for building and operating algorithmic trading strategies in financial markets. It includes tools for collecting and processing market data from multiple sources, performing statistical and machine learning analysis, and generating trading signals based on quantitative models. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    PandasAI

    PandasAI

    PandasAI is a Python library that integrates generative AI

    PandasAI is a Python library that adds Generative AI capabilities to pandas, the popular data analysis and manipulation tool. It is designed to be used in conjunction with pandas, and is not a replacement for it. PandasAI makes pandas (and all the most used data analyst libraries) conversational, allowing you to ask questions to your data in natural language.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Network Management Software and Tools for Businesses and Organizations | Auvik Networks Icon
    Network Management Software and Tools for Businesses and Organizations | Auvik Networks

    Mapping, inventory, config backup, and more.

    Reduce IT headaches and save time with a proven solution for automated network discovery, documentation, and performance monitoring. Choose Auvik because you'll see value in minutes, and stay with us to improve your IT for years to come.
    Learn More
  • 10
    MarkItDown

    MarkItDown

    Python tool for converting files and office documents to Markdown

    MarkItDown is a lightweight Python utility developed by Microsoft for converting various files and office documents to Markdown format. It is particularly useful for preparing documents for use with large language models and related text analysis pipelines. ​
    Downloads: 45 This Week
    Last Update:
    See Project
  • 11
    Potpie

    Potpie

    Create custom engineering agents for your codebase

    Potpie is an AI-powered data analysis tool that automates the exploration and visualization of datasets, assisting users in uncovering insights without extensive coding.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    OpenBB

    OpenBB

    Investment Research for Everyone, Everywhere

    Customize and speed up your analysis, bring your own data, and create instant reports to gain a competitive edge. Whether it’s a CSV file, a private endpoint, an RSS feed, or even embed an SEC filing directly. Chat with financial data using large language models. Don’t waste time reading, create summaries in seconds and ask how that impacts investments. Create your dashboard with your favorite widgets.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    Scanpy

    Scanpy

    Single-cell analysis in Python

    Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    FinGPT

    FinGPT

    Open-Source Financial Large Language Models

    FinGPT is an open-source, finance-specialized large language model framework that blends the capabilities of general LLMs with real-time financial data feeds, domain-specific knowledge bases, and task-oriented agents to support market analysis, research automation, and decision support. It extends traditional GPT-style models by connecting them to live or historical financial datasets, news APIs, and economic indicators so that outputs are grounded in relevant and recent market conditions rather than generic knowledge alone. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 15
    Classical Language Toolkit (CLTK)

    Classical Language Toolkit (CLTK)

    The Classical Language Toolkit

    The Classical Language Toolkit (CLTK) is a Python library offering natural language processing support for classical languages, including Latin, Greek, and others.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    NVIDIA Earth2Studio

    NVIDIA Earth2Studio

    Open-source deep-learning framework

    NVIDIA Earth2Studio is an open-source Python package and framework designed to accelerate the development and deployment of AI-driven weather and climate science workflows. It provides a unified API that lets researchers, data scientists, and engineers build complex forecasting and analysis pipelines by combining modular prognostic and diagnostic AI models with a diverse range of real-world data sources such as global forecast systems, reanalysis datasets, and satellite feeds. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    DataDreamer

    DataDreamer

    DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models

    DataDreamer is a tool designed to assist in the generation and manipulation of synthetic data for various applications, including testing and machine learning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    BettaFish

    BettaFish

    Public opinion analysis system

    BettaFish is an open-source, multi-agent public opinion analysis system built to automate the collection, deep analysis, and reporting of social media data at scale through conversational queries. It uses a modular architecture of specialized agents that collaborate to crawl mainstream platforms, extract multimodal content like text and short video, and synthesize insights through both statistical and large language model techniques.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    AtomAI

    AtomAI

    Deep and Machine Learning for Microscopy

    AtomAI is a Pytorch-based package for deep and machine-learning analysis of microscopy data that doesn't require any advanced knowledge of Python or machine learning. The intended audience is domain scientists with a basic understanding of how to use NumPy and Matplotlib. It was developed by Maxim Ziatdinov at Oak Ridge National Lab. The purpose of the AtomAI is to provide an environment that bridges the instrument-specific libraries and general physical analysis by enabling the seamless deployment of machine learning algorithms including deep convolutional neural networks, invariant variational autoencoders, and decomposition/unmixing techniques for image and hyperspectral data analysis. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    MCP Timeplus

    MCP Timeplus

    Execute SQL queries and manage databases seamlessly with Timeplus

    An MCP server designed for integration with Timeplus, enabling real-time data streaming and analytics through natural language interactions. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Pandas Profiling

    Pandas Profiling

    Create HTML profiling reports from pandas DataFrame objects

    pandas-profiling generates profile reports from a pandas DataFrame. The pandas df.describe() function is handy yet a little basic for exploratory data analysis. pandas-profiling extends pandas DataFrame with df.profile_report(), which automatically generates a standardized univariate and multivariate report for data understanding. High correlation warnings, based on different correlation metrics (Spearman, Pearson, Kendall, Cramér’s V, Phik). Most common categories (uppercase, lowercase, separator), scripts (Latin, Cyrillic) and blocks (ASCII, Cyrilic). ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    doccano

    doccano

    Open source annotation tool for machine learning practitioners

    doccano is an open-source text annotation tool for humans. It provides annotation features for text classification, sequence labeling and sequence-to-sequence tasks. So, you can create labeled data for sentiment analysis, named entity recognition, text summarization and so on. Just create a project, upload data and start annotating. You can build a dataset in hours.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 23
    graphify

    graphify

    AI coding assistant skill (Claude Code, Codex, OpenCode, OpenClaw)

    ...The system likely supports dynamic updates, allowing graphs to evolve as data changes or new inputs are introduced. It is particularly useful in domains such as network analysis, knowledge graphs, and system architecture visualization. The architecture emphasizes flexibility, enabling users to customize how data is mapped and displayed. It may also include analytical features to explore patterns, clusters, or anomalies within the graph.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 24
    MCP ZoomEye

    MCP ZoomEye

    A Model Context Protocol server that provides network asset info

    The ZoomEye MCP Server is a Model Context Protocol server that provides network asset information based on query conditions, allowing Large Language Models to obtain data by querying ZoomEye using dorks and other search parameters. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Synthetic Data Vault (SDV)

    Synthetic Data Vault (SDV)

    Synthetic Data Generation for tabular, relational and time series data

    The Synthetic Data Vault (SDV) is a Synthetic Data Generation ecosystem of libraries that allows users to easily learn single-table, multi-table and timeseries datasets to later on generate new Synthetic Data that has the same format and statistical properties as the original dataset. Synthetic data can then be used to supplement, augment and in some cases replace real data when training Machine Learning models. Additionally, it enables the testing of Machine Learning or other data dependent...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB