DocStrange is an open-source document understanding and extraction library designed to convert complex files into structured, LLM-ready outputs such as Markdown, JSON, CSV, and HTML. Developed by Nanonets, the project combines OCR, layout detection, table understanding, and structured extraction into one end-to-end pipeline, which reduces the need to stitch together multiple separate services. It is built for developers who need high-quality parsing from scans, photos, PDFs, office files, and other document sources while preserving privacy and control over the processing flow. One of its key differentiators is deployment flexibility: it offers a cloud API for managed usage as well as a fully private offline mode that runs locally on a GPU. The platform also supports synchronous extraction, streaming responses, and asynchronous processing for larger documents, which makes it adaptable to both interactive workflows and heavier back-end pipelines.

Features

  • Extraction from PDFs, images, Word files, Excel files, PowerPoint files, and URLs
  • Output generation in Markdown, JSON, CSV, and HTML formats
  • End-to-end OCR, layout analysis, and table extraction pipeline
  • Private offline GPU mode in addition to managed cloud API access
  • Streaming support for real-time extraction results
  • Asynchronous processing for larger multi-page documents

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow DocStrange

DocStrange Web Site

Other Useful Business Software
Earn up to 15% annual interest with Nexo. Icon
Earn up to 15% annual interest with Nexo.

Access competitive interest rates on your digital assets.

Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
Get started with Nexo.
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of DocStrange!

Additional Project Details

Programming Language

TypeScript

Related Categories

TypeScript Large Language Models (LLM)

Registered

2026-03-09