Carrot2 4.1.x
Release history for Carrot2 4.1.x and bugfix releases.
Version 4.1.0
This release changes the lexical data dictionary formats, adds ephemeral per-request dictionaries and introduces minor adjustments to Java and REST APIs.
New features
- Carrot2 Workbench
-
Carrot2 Clustering Workbench has been rewritten as a browser-based application.
You can use Workbench to cluster documents from local XML, JSON, Excel and CSV files, as well as Solr and Elasticsearch instances. A set of sliders is available to change clustering parameters in real time; you can also export the parameters JSON ready for pasting into REST API requests. Finally, you can export the clustering results as JSON or Excel spreadsheet.
#36 - JSON dictionaries
-
Carrot2 word and label filtering dictionaries are now stored in the JSON format. This change adds more expressive matching modes, such as globs for simple phrase-level filtering or regular expressions for complete control of the filtering. Please refer to the dictionaries section for an in-depth overview of what's available.
As a follow-up, the plain-text dictionaries have been deprecated and the file naming convention for the default dictionary files has changed. A dictionary file conversion utility is available.
#51 - Per-request dictionaries
-
Per-request (ephemeral) label and word filtering support has been added. This feature allows passing per-request word and cluster label filters to be applied in addition to the default language resources. See ephemeral dictionary section in the Java API and REST API sections for more information.
#44
API changes
- Plain text dictionaries deprecated
-
As a follow-up to the JSON dictionaries new feature, the plain-text-based format has been deprecated.
File naming convention for default language resources has changed. For backward compatibility, if old resources can be found in the resource lookup location, they will still be used and a warning will be issued via Java logging system.
If you have language resources in the old format, please convert them to the JSON format. A simple utility is included in Carrot2 core JAR and can help with the conversion. Just run it with:
java -cp carrot2-core-4.1.0.jar org.carrot2.language.ConvertLegacyResources [dir]
Where
#51dir
points to a directory with old resources. New resources in their corresponding naming convention will be written alongside old resources. The old resource must be manually deleted once the conversion completes successfully. - More details from
/list
method -
The
/service/list
endpoint of the REST API now returns the language and algorithm for all of the available request templates.The response format of the endpoint has changed. Previously, the
templates
element was a list of template names, now it will contain an object with template names as keys and template content as values, for example:
#38... "templates" : { "english-lingo" : { "language" : "English", "algorithm" : "Lingo" }, "stc" : { "algorithm" : "STC" } }
- Lingo filter parameter change
-
Lingo algorithm's filter parameters have been changed from Booleans to proper objects with a dedicated
#43enabled
parameter. Unless you used these attributes explicitly, no action is needed. LexicalData
interface split-
The
#45LexicalData
interface (LanguageComponents
component) has been split into two independent components:StopwordFilter
andLabelFilter
. The default implementations and abstract classes have been changed accordingly.
Improvements
- GZIP compression
-
REST API built-in server now supports GZIP compression.
#66 - Request processing information
-
Added clustering and request processing time information to clustering response. This information is optional and is returned when the
#35serviceInfo
HTTP parameter is enabled on a clustering request. - Java module system support improved
-
Improved support for the Java module system by providing the
#59Automatic-Module-Name
entry in JAR manifests.
Bug fixes
- Clustering of multi-value fields fails
-
Carrot2 4.0.x fails to cluster documents containing multi-value fields (array of strings). Version 4.1.0 fixes the issue.
#34