Showing 62 open source projects for "java html parser"

View related business solutions
  • Earn up to 15% annual interest with Nexo. Icon
    Earn up to 15% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • Earn up to 15% annual interest with Nexo. Icon
    Earn up to 15% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 1
    ANTLR

    ANTLR

    Parser generator to read, process, or translate structured text

    ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees. It’s widely used in academia and industry to build all sorts of languages, tools, and frameworks. Twitter search uses ANTLR for query parsing, with over 2 billion queries a day. The languages for...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 2
    tika-python

    tika-python

    Python binding to the Apache Tika™ REST services

    A Python port of the Apache Tika library that makes Tika available using the Tika REST Server. This makes Apache Tika available as a Python library, installable via Setuptools, Pip and easy to install. To use this library, you need to have Java 7+ installed on your system as tika-python starts up the Tika REST server in the background. To get this working in a disconnected environment, download a tika server file (both tika-server.jar and tika-server.jar.md5, which can be found here) and set...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    OmegaT - multiplatform CAT tool

    OmegaT - multiplatform CAT tool

    The free computer aided translation (CAT) tool for professionals

    OmegaT is a free and open source multiplatform Computer Assisted Translation tool with fuzzy matching, translation memory, keyword search, glossaries, and translation leveraging into updated projects.
    Leader badge
    Downloads: 1,490 This Week
    Last Update:
    See Project
  • 4
    ant4docbook

    ant4docbook

    ANT4DOCBOOK is an ANT task for DOCBOOK

    ANT4DOCBOOK is an ANT task for DOCBOOK, a semantic markup language for technical documentation.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Get full visibility and control over your tasks and projects with Wrike. Icon
    Get full visibility and control over your tasks and projects with Wrike.

    A cloud-based collaboration, work management, and project management software

    Wrike offers world-class features that empower cross-functional, distributed, or growing teams take their projects from the initial request stage all the way to tracking work progress and reporting results.
    Learn More
  • 5
    DocWire SDK

    DocWire SDK

    Award-winning modern data processing SDK in C++20

    DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6
    RTextDoc

    RTextDoc

    An editor for structured documents

    RTextDoc is an editor for structured text documents such as LaTeX, AsciiDoc, DocBook. RTextDoc has proofreading capabilities: on-the-fly spelling, instant grammar checking and built-in free dictionaries. RTextDoc has syntax highlighting, bracket matching, folding, document structure browser for sections and labels, bookmarks, manager for LaTeX symbols, an editor for mathematical equations,integrated BibTeX database manager and several tools to convert LaTeX to HTML and back. AsciiDoc...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    SimplyHTML is an application and a java component for rich text processing. It stores documents as HTML files in combination with Cascading Style Sheets (CSS). SimplyHTML is not intended to be used as an editor for web pages.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Leseratte is a Java parser for German written language. Currently, it contains a German lexicon (based on the Wiktionary), inflexion rules, a grammar and a parser. (Semantics component planned.) Usable as a Java library, also provides a graphical UI.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Web Book Downloader

    Web Book Downloader

    Download websites as e-book: pdf, txt, epub.

    This application allows user to download chapters from website in 3 ways: - from table of contents; - from range: first chapter address, last chapter address; - by crawling from first chapter to n; In settings you can customize language, input(website encoding) for simplicity output is in the same encoding. If you want your language add new class into strings package, and new fields into Settings class and GUI menu(initialize method).
    Downloads: 4 This Week
    Last Update:
    See Project
  • SoftCo: Enterprise Invoice and P2P Automation Software Icon
    SoftCo: Enterprise Invoice and P2P Automation Software

    For companies that process over 20,000 invoices per year

    SoftCo Accounts Payable Automation processes all PO and non-PO supplier invoices electronically from capture and matching through to invoice approval and query management. SoftCoAP delivers unparalleled touchless automation by embedding AI across matching, coding, routing, and exception handling to minimize the number of supplier invoices requiring manual intervention. The result is 89% processing savings, supported by a context-aware AI Assistant that helps users understand exceptions, answer questions, and take the right action faster.
    Learn More
  • 10

    Ghawwas_V4

    An open source system for Arabic corpora processing

    Ghawwas (previously known as Khawas) is an open source system for Arabic corpora processing. Ghawwas V4.0 provides the following main functions: a. Frequency list for single word and N-Grams b. Concordance c. Collocation (MI, CHI Squared, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient) d. Lexical patterns search e. Two corpora frequency profile comparison based on MI, CHI, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient f. Accept Windows and UTF-8 character...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 11
    iText®, a JAVA PDF library

    iText®, a JAVA PDF library

    PDF Library for Developers

    iText is an open-source PDF library available for Java and .NET (C#). iText allows you to effortlessly generate and manipulate standards-compliant PDF documents with a powerful and feature-rich SDK. With iText, you can create archivable and accessible PDFs, split and merge documents, fill and flatten forms, digitally sign documents, and more. iText add-ons enable additional functionality, such as PDF creation from HTML templates, secure redaction, OCR, and much more. ...
    Leader badge
    Downloads: 164 This Week
    Last Update:
    See Project
  • 12
    Command-line/Ant-task/embeddable text file preprocessor. Macros, flow control, expressions. Recursive directory processing. Extensible in Java to display data from any data sources (as database). Can generate complete homepages (tree of HTML-s, images, etc.)
    Leader badge
    Downloads: 81 This Week
    Last Update:
    See Project
  • 13
    cpDetector is a proxy for codepage detection of documents. It delegates to multiple instances that try to detect the codepage by different techinques. A command line executeable is shipped that allows to sort documents by codepage.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Notepad3

    Notepad3

    Light-weight Scintilla-based text editor with syntax highlighting

    Notepad3 is a fast and light-weight Scintilla-based text editor with syntax highlighting. Notepad3 is an excellent replacement for the default Windows text editor. Notepad3 offers many extra features over Notepad. It has a small memory footprint, but is powerful enough to handle most programming jobs.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 15
    Jericho HTML Parser is a java library allowing analysis and manipulation of parts of an HTML document, including server-side tags, while reproducing verbatim any unrecognised or invalid HTML.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    A java-based parser for parsing/grabbing web sites and other text or XML documents, based on a nondeterministic parser language, creating XML output. Also contains a few utility classes for HTML, CSV and text parsing, and additional character sets.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Jaxe
    Jaxe is a free Java XML editor with a configurable GUI, using XML schemas for validation and XSL for exports in HTML or XML.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Simple Java delimited and fixed width file parser. Handles CSV, Excel CSV, Tab, Pipe delimiters, just to name a few. Maps column positions in the file to user friendly names via XML. See "FlatPack Feature List" under News for complete feature list.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19

    Spock Text Editor

    Simply text editor for php, jsp, html, etc...

    This app is designed in Java, so is fully compatible with Win, Mac and Linux 32 or 64 bits. It's a simple and fast text editor and supports: *.txt *.jsp *.php *.c *.h(headers for C language) *.java *.htm/html This is the first version and my first application on java. I hope you like it! See you in version 2! ;-)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    SutraReader

    Arranges a Sutra text in the traditional layout

    This is an application designed to arrange / lay out a Chinese or Japanese Sutra text in the traditional layout (from top to bottom, from right to left). The input can be any file (the application can pick out the relevant parts) and the output is the layout (arranged in HTML file(s)) and the content (a plain text file with the content). Beside this, you can get a statistics about the ideograms and can exclude certain characters or ideograms from the content.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    The DocBook Publishing Utilities tools, which make creation and publishing of DocBook easier. The tools are: Maven plug-in to Transform HTML into XML (use after docbkx); Eclipse DocBook table editor; Eclipse wizards for initial DocBook files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Drag-and-drop files/directories/HTML-URLs into a Java GUI. Perform text operations on the files into output files. Operations include concatention, text and regex editing, and other file/string/row/column/script operations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    OmniHelp is a cross-platform, browser-independent, tri-pane help viewer built in pure JavaScript and CSS with HTML 4. Some functions (such as help embedding) may in the future be in Java, C, or C++; CSH is fully supported. All code is under the LGPL.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    TagParser is a java parser based on CSS formulas (like JQuery) and can parse any documents based on tags such as XML, HTML. Furthermore, it doesn't require documents to be well formed and can parse complex documents with embedded scripts or CSS parts
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    ApexText

    ApexText

    Efficient and lightweight text editor with rich functionalities.

    ApexText is a general purpose text editor for developers and non-developers. It supports synatx highlighting for Java, C, C++, Perl, SQL, JSP, HTML etc., tooling for Java. Many UI features are configurable.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB