Request Templates

This section introduces request templates and why they can be useful to detach algorithm configuration from actual clustering requests.

As mentioned in the introduction, a clustering query is self-contained and must carry the three required elements: algorithm name, language and documents to be clustered. In practice, the first two elements will rarely change – once the algorithm for a particular task is chosen, it will probably remain constant for all clustering requests. In situations like this we can pull out the constant part of the query into a template and use basic composition with the body of a query to get the final request.

Another useful application of templates is to detach the algorithm and its configuration from the logic that queries the clustering service or to group different algorithms and their tuning variants under much easier to remember aliases.

How to create a request template?

Let's explain it by walking through a simple example. For a request body like the one shown below, we can pull out everything up to the documents section:

{
  "language": "English",
  "algorithm": "Lingo",
  "documents": [
    { "title": "PDF Viewer on Windows" },
    { "title": "Firefox PDF plugin to view PDF in browser on Windows" },
    { "title": "Limit CPU usage for flash in Firefox?" }
  ]
}

Let's create a file called 04 foo.json under DCS-relative path: web/service/templates/04 foo.json. The name (identifier) of the template is embedded in the file name: it is foo. The number in front of the alphanumeric name is always stripped (but it is used to sort the templates for display purposes).

{
  "language": "English",
  "algorithm": "Lingo",
  "parameters": {
    "preprocessing": {
      "phraseDfThreshold": 1,
      "wordDfThreshold": 1
    }
  }
}

The foo template contains constant (but overrideable) elements of our clustering query. Once the DCS is started we can see if the template has been loaded by looking at the startup log message or by running a GET query against the /list endpoint:

...
  "templates" : [
    "frontend-default",
    "lingo",
    "stc",
    "bkmeans",
    "foo"
  ]
}

Armed with the template foo we can now assemble a much simpler clustering request containing just the documents to be clustered:

{
  "documents": [
    { "field": "foo bar" },
    { "field": "bar" },
    { "field": "baz" }
  ]
}

When posting the documents for clustering, use the template request parameter to provide the the template name (foo in our case):

curl -X POST --header "Content-Type: text/json" --data-binary @template-request.json "http://localhost:8080/service/cluster?indent&template=foo"