Search Results for "java text mining preprocessing"

Showing 54 open source projects for "java text mining preprocessing"

View related business solutions
  • Earn up to 15% annual interest with Nexo. Icon
    Earn up to 15% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • Earn up to 15% annual interest with Nexo. Icon
    Earn up to 15% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 1
    Dawarich

    Dawarich

    Self-hostable alternative to Google Timeline

    Dawarich is a command-line tool (likely Ruby-based) for transforming and analyzing Arabic text data with normalization, diacritic handling, segmentation, and morphological tokenization. Designed for text mining and NLP workflows in Arabic-language contexts.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Weka

    Weka

    Machine learning software to solve data mining problems

    Weka is a collection of machine learning algorithms for solving real-world data mining problems. It is written in Java and runs on almost any platform. The algorithms can either be applied directly to a dataset or called from your own Java code.
    Leader badge
    Downloads: 11,766 This Week
    Last Update:
    See Project
  • 3
    ant4docbook

    ant4docbook

    ANT4DOCBOOK is an ANT task for DOCBOOK

    ANT4DOCBOOK is an ANT task for DOCBOOK, a semantic markup language for technical documentation.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Get full visibility and control over your tasks and projects with Wrike. Icon
    Get full visibility and control over your tasks and projects with Wrike.

    A cloud-based collaboration, work management, and project management software

    Wrike offers world-class features that empower cross-functional, distributed, or growing teams take their projects from the initial request stage all the way to tracking work progress and reporting results.
    Learn More
  • 5
    DocWire SDK

    DocWire SDK

    Award-winning modern data processing SDK in C++20

    DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    DataMelt

    DataMelt

    Computation and Visualization environment

    DataMelt (or "DMelt") is an environment for numeric computation, data analysis, computational statistics, and data visualization. This Java multiplatform program is integrated with several scripting languages such as Jython (Python), Groovy, JRuby, BeanShell. DMelt can be used to plot functions and data in 2D and 3D, perform statistical tests, data mining, numeric computations, function minimization, linear algebra, solving systems of linear and differential equations. Linear, non-linear...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Lingua

    Lingua

    The most accurate natural language detection library for Java

    Its task is simple: It tells you which language some provided textual data is written in. This is very useful as a preprocessing step for linguistic data in natural language processing applications such as text classification and spell checking. Other use cases, for instance, might include routing e-mails to the right geographically located customer service department, based on the e-mails' languages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    The Lemur Project

    The Lemur Project

    Search engine and data mining applications and ClueWeb datasets.

    The Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software, including the Indri search engine in C++, the Galago search engine research framework in Java, the RankLib learning to rank library, ClueWeb09 and ClueWeb12 datasets and the Sifaka data mining application.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 9
    libpostal

    libpostal

    A C library for parsing/normalizing street addresses around the world

    A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data. libpostal is a C library for parsing/normalizing street addresses around the world using statistical NLP and open data. The goal of this project is to understand location-based strings in every language, everywhere. Addresses and the locations they represent are essential for any application dealing with maps (place search, transportation, on-demand/delivery services,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Run applications fast and securely in a fully managed environment Icon
    Run applications fast and securely in a fully managed environment

    Cloud Run is a fully-managed compute platform that lets you run your code in a container directly on top of scalable infrastructure.

    Run frontend and backend services, batch jobs, deploy websites and applications, and queue processing workloads without the need to manage infrastructure.
    Try for free
  • 10
    DynaQ

    DynaQ

    Innovative text document search. http://dynaq.opendfki.de for details.

    The goal of DynaQ is to develop an inquiry system to explore the personal information space, supporting you with the searching paradigm 'orienteering'. DynaQ is a (desktop)search engine with enhanced functionality for file, email and blog search. Look at our GitLab homepage for sourcecode and documentation: http://dynaq.opendfki.de
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    RapidMiner -- Data Mining, ETL, OLAP, BI
    ETL, data warehousing, data mining, OLAP, business intelligence (BI) in Java. 500+ modules: extract, transform, load (ETL), data mining, data analysis + Weka, statistical forecasting, preprocessing, validation, visualization, OLAP, business intelligence.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 12
    @Note2

    @Note2

    @Note2 - A workbench for Biomedical Text Mining

    Biomedical Text Mining (BioTM) is providing valuable approaches to the automated curation of scientific literature.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    DSTK - Data Science TooKit 3

    DSTK - Data Science TooKit 3

    Data and Text Mining Software for Everyone

    DSTK - Data Science Toolkit 3 is a set of data and text mining softwares, following the CRISP DM model. DSTK offers data understanding using statistical and text analysis, data preparation using normalization and text processing, modeling and evaluation for machine learning and algorithms. It is based on the old version DSTK at https://sourceforge.net/projects/dstk2/ DSTK Engine is like R. DSTK ScriptWriter offers GUI to write DSTK script. DSTK Studio offers SPSS Statistics like GUI...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    JSentiWordNet

    A wrapper for the famous SentiWordNet, a resource for opinion mining

    This project aims to provide a wrapper around the SentiWrodnet, a lexical resource for opinion mining. As defined by the authors : SentiWordNet assigns to each synset of WordNet three sentiment scores: positivity, negativity, objectivity. You can find additional information about the creation of SentiWordnet here : http://nmis.isti.cnr.it/sebastiani/Publications/LREC06.pdf sentiWordnet (avilable here : https://drive.google.com/open?id=0B0ChLbwT19XcOVZFdm5wNXA5ODg) is a text file with a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    GNAT

    GNAT

    GNAT recognizes gene names in text and maps them to NCBI Entrez Gene

    GNAT is a BioNLP/text mining tool to recognize and identify gene/protein names in natural language text. It will detect mentions of genes in text, such as PubMed/Medline abstracts, and disambiguate them to remove false positives and map them to the correct entry in the NCBI Entrez Gene database by gene ID. March 2017: We started to upload GNAT output on Medline. See files/results/medline/.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    DSTK - DataScience ToolKit

    DSTK - DataScience ToolKit

    DSTK - DataScience ToolKit for All of Us

    DSTK - DataScience ToolKit is an opensource free software for statistical analysis, data visualization, text analysis, and predictive analytics. Newer version and smaller file size can be found at: https://sourceforge.net/projects/dstk3/ It is designed to be straight forward and easy to use, and familar to SPSS user. While JASP offers more statistical features, DSTK tends to be a broad solution workbench, including text analysis and predictive analytics features. Of course you may specify...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17

    sgmweka

    Weka wrapper for the SGM toolkit for text classification and modeling.

    Weka wrapper for the SGM toolkit for text classification and modeling. Provides Sparse Generative Models for scalable and accurate text classification and modeling for use in high-speed and large-scale text mining. Has lower time complexity of classification than comparable software due to inference based on sparse model representation and use of an inverted index. The provided .zip file is in the Weka package format, giving access to text classification. ...
    Leader badge
    Downloads: 19 This Week
    Last Update:
    See Project
  • 18
    The Java Data Mining Package (JDMP) is a library that provides methods for analyzing data with the help of machine learning algorithms (e.g. clustering, classification, graphical models, neural networks, Bayesian networks, text processing, optimization).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Jbowl is a Java library intended to provide an API for development of text mining applications. It provides facilities for text analysis, as well as for building, evaluating and applying of various supervised and unsupervised text mining models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Stemmer Gujarati

    Stemmer Gujarati

    Offline stemmer for Gujarati , which is one of 22 Indian languages.

    This is a Gujarati stemmer in Java. Stemming is a process in which affixes are removed form the root word (stem). It relates morphological variant words to corresponding common root. For example "પ્રતિઉપયોગી" is word which has stem " ઉપયોગ". Stemmers are language specific tools. The design of a stemming algorithm requires a significant level of linguistic expertise. There has been lot of significant work in the development and evaluation of stemmer for non-Indian languages, but very less...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Cenobi

    Cenobi

    cost estimation and management accounting, using neural networks

    Cenobi is designed for management accountants, not (only) for statisticians and data mining experts. Carefully arranged default settings make sure you can concentrate on Cenobi's many accounting features rather than worrying about setting up artificial neural networks or genetic algorithms, which are the main machine learning tools under Cenobi's hood. Cenobi's main benefits are: - ease of use - Utilizing artificial neural networks to estimate cost relationships, Cenobi is able to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22

    webtextanalysis

    Mining knowledge from text data

    This project aims to implement in java the following text mining techniques: Text Language Detection, Keywords and keyphrases extraction, Text Classification, Text Clustering, Single or multiple documents Summarization, Plagiarism Detection.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Framework for text mining, data integration and data analysis. Keywords: ontology and graph alignment, relation mining, warehouse, semantic database integration, bioinformatics, systems biology, microarray, Java.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    TML - Text Mining Library for LSA & CMM

    TML is a Java Library for LSA and extracting Concept Maps from text

    TML has moved to http://www.villalon.cl/tml.html and the code to https://github.com/villalon/tml
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    TextProcessor

    A Java package to preprocess text datasets for posterior text analysis

    The TextProcessor Java package is a text processing toolkit, which provides some frequently used text processing functions such as stemming, removing stop-words, generating a term vocabulary, and calculating the term-doc frequency matrix. Basic topic mining models such as LDA and sparse NMF are also supported. The package can also generate feature files from a given text dataset with LDA and LIBSVM format for posterior procedures such as classification or clustering. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB