Extension Points

This article discusses dynamic discovery of available Carrot2 components via service extension points.

Many Carrot2 components, including algorithms, language components and their building blocks, are implemented as service extension points. While they can be referenced directly using explicit class names (new LingoClusteringAlgorithm()), you can also enumerate and inspect their properties dynamically.

The following code enumerates all available algorithm providers, checks if the given algorithm supports the English language and clusters the same stream of documents repeatedly.

List<ClusteringAlgorithmProvider> providers =
    ServiceLoader.load(ClusteringAlgorithmProvider.class).stream()
        .map(Provider::get)
        .collect(Collectors.toList());

for (ClusteringAlgorithmProvider provider : providers) {
  System.out.println("Clustering algorithm: " + provider.name() + "\n");
  ClusteringAlgorithm algorithm = provider.get();
  if (algorithm.supports(english)) {
    List<Cluster<Document>> clusters =
        algorithm.cluster(ExamplesData.documentStream(), english);
    ExamplesCommon.printClusters(clusters);
  } else {
    String name = english.language();
    System.out.println("  (Language not supported: " + name + ").");
  }
}

A similar service extension point exists for language component providers, this time we use a facade method that does additional sanity checks for us before returning the result:

System.out.println(
    "Language components for the following languages are available:\n  "
        + String.join(", ", LanguageComponents.languages()));