Carrot2 4.1.x
Release history for Carrot2 4.1.x and bugfix releases.
Version 4.1.0
This release changes the lexical data dictionary formats, adds ephemeral per-request dictionaries and introduces minor adjustments to Java and REST APIs.
New features
- Carrot2 Workbench
-
Carrot2 Clustering Workbench has been rewritten as a browser-based application.
You can use Workbench to cluster documents from local XML, JSON, Excel and CSV files, as well as Solr and Elasticsearch instances. A set of sliders is available to change clustering parameters in real time; you can also export the parameters JSON ready for pasting into REST API requests. Finally, you can export the clustering results as JSON or Excel spreadsheet.
#36 - JSON dictionaries
-
Carrot2 word and label filtering dictionaries are now stored in the JSON format. This change adds more expressive matching modes, such as globs for simple phrase-level filtering or regular expressions for complete control of the filtering. Please refer to the dictionaries section for an in-depth overview of what's available.
As a follow-up, the plain-text dictionaries have been deprecated and the file naming convention for the default dictionary files has changed. A dictionary file conversion utility is available.
#51 - Per-request dictionaries
-
Per-request (ephemeral) label and word filtering support has been added. This feature allows passing per-request word and cluster label filters to be applied in addition to the default language resources. See ephemeral dictionary section in the Java API and REST API sections for more information.
#44
API changes
- Plain text dictionaries deprecated
-
As a follow-up to the JSON dictionaries new feature, the plain-text-based format has been deprecated.
File naming convention for default language resources has changed. For backward compatibility, if old resources can be found in the resource lookup location, they will still be used and a warning will be issued via Java logging system.
If you have language resources in the old format, please convert them to the JSON format. A simple utility is included in Carrot2 core JAR and can help with the conversion. Just run it with:
java -cp carrot2-core-4.1.0.jar org.carrot2.language.ConvertLegacyResources [dir]Where
#51dirpoints to a directory with old resources. New resources in their corresponding naming convention will be written alongside old resources. The old resource must be manually deleted once the conversion completes successfully. - More details from
/listmethod -
The
/service/listendpoint of the REST API now returns the language and algorithm for all of the available request templates.The response format of the endpoint has changed. Previously, the
templateselement was a list of template names, now it will contain an object with template names as keys and template content as values, for example:#38... "templates" : { "english-lingo" : { "language" : "English", "algorithm" : "Lingo" }, "stc" : { "algorithm" : "STC" } } - Lingo filter parameter change
-
Lingo algorithm's filter parameters have been changed from Booleans to proper objects with a dedicated
#43enabledparameter. Unless you used these attributes explicitly, no action is needed. LexicalDatainterface split-
The
#45LexicalDatainterface (LanguageComponentscomponent) has been split into two independent components:StopwordFilterandLabelFilter. The default implementations and abstract classes have been changed accordingly.
Improvements
- GZIP compression
-
REST API built-in server now supports GZIP compression.
#66 - Request processing information
-
Added clustering and request processing time information to clustering response. This information is optional and is returned when the
#35serviceInfoHTTP parameter is enabled on a clustering request. - Java module system support improved
-
Improved support for the Java module system by providing the
#59Automatic-Module-Nameentry in JAR manifests.
Bug fixes
- Clustering of multi-value fields fails
-
Carrot2 4.0.x fails to cluster documents containing multi-value fields (array of strings). Version 4.1.0 fixes the issue.
#34