Hello, Carrot2!
Carrot2 is a programming library for clustering text. It can automatically discover groups of related documents and label them with short key terms or phrases.
Carrot2 can, for example, organize search results into groups like these:
What's in the box
Carrot2 provides a common infrastructure and a number of algorithms for clustering of text. Out-of-the-box, Carrot2 distribution comes with:
- the Java API and several clustering algorithm implementations,
- the REST service for mash-ups or integration with languages other than Java,
- the Search Results Clustering demo application,
- the Clustering Workbench application for more advanced users,
- code snippets and examples for reuse in your code.
Additionally, several downstream projects provide integration between Carrot2 and popular document retrieval services:
- Apache Solr has built-in support for clustering search results via Carrot2 algorithms,
- elasticsearch-carrot2 plugin provides search results clustering for Elasticsearch.