Getting started
The quickest way to try Carrot2 is to visit the on-line demo. For more options, such as Java or REST API, read on.
On-line demo
You can use the on-line demo to play with clustering of web search results provided by eTools and explore medical documents from the PubMed database.
Clustering your own data
Part of the on-line demo is also the Carrot2 Clustering Workbench — a more advanced application for clustering content from files, Solr or Elasticsearch instances. You can use it to test Carrot2 clustering with your own content.
Please note that Workbench will transfer the contents of your files or search results to the Carrot2 server for clustering. The server will keep the data in memory for the duration of the clustering process. None of the data you submit will be permanently stored or logged.
Additionally, our server limits the rate and size of clustering requests to keep the service from overloading.
If you'd rather keep your data private or hit the processing limits, install Carrot2 on your own machine.
Excel, OpenOffice or CSV
To cluster data from an Excel, OpenOffice or CSV spreadsheet:
-
Make sure your spreadsheet contains one document per row. The first row will be treated as a header with field names, for example:
A B C D E 1 id title question score views 2 67 PDF Viewer on Windows I've tried Foxit and Adobe's reader, but I'm not satisfied with either. Foxit has update nagging for non-critical junk. 39 1975 3 94 What Windows services can I safely disable? I'm trying to improve the boot time and general performance of a Windows XP machine and ... 28 4808 4 135 Log viewer on Windows I'm a developer, and I generate big log files. I've tried several log viewer applications ... 31 26011 The spreadsheet can contain fields of all types, Workbench will try to identify the natural text fields to be used for clustering.
-
Open Carrot2 Clustering Workbench in a modern browser.
-
Choose Local file in the Data source combo box and upload the spreadsheet with your data. If necessary, refine the selection of fields to cluster using the Fields to cluster check boxes.
-
Press the Cluster button to generate the clusters.
Solr or Elasticsearch
If your data is stored in an Apache Solr or Elasticsearch:
-
Choose Solr or Elasticsearch in the Data source combo box.
-
Provide the service URL of your search server and press the Connect button.
Make sure that your server is configured to emit CORS HTTP headers, otherwise Workbench will not be able to query it.
-
Choose the collection to search, type query and press Cluster.
JSON file
You can submit a file containing an array of flat JSON objects for clustering, for example:
[
{ title: "Title 1", body: "Text", views: 583 },
{ title: "Title 2", body: "Text", views: 23 }
]
Each object represents one document. The object can contain both textual and non-textual properties, Workbench will try to determine the fields containing natural text.
To cluster the contents of a JSON file:
-
Choose Local file in the Data source combo box.
-
Upload or drag and drop your JSON file.
-
Choose the fields to cluster and press Cluster.
Workbench can also cluster files in the Carrot2 legacy XML format, but that format is discouraged because it does not support arbitrary field types.
Local installation
You can install Carrot2 on your own machine to use the search results clustering and Workbench applications without any limitations.
To run Carrot2 on your machine:
-
Download the latest release package.
-
Follow Carrot2 Document Clustering Server installation instructions.
-
Open http://localhost:8080 in a modern browser to access the applications.
APIs and integrations
If you'd like to integrate Carrot2 with your existing systems, use one of the following options:
- the Java API,
- the REST API for other programming languages,
- Apache Solr has built-in support for clustering search results via Carrot2 algorithms (versions up to Solr 8.7 and starting with Solr 9 - unreleased yet),
- elasticsearch-carrot2 plugin provides search results clustering for Elasticsearch.